Tuesday, March 31, 2020

A startup is building computer chips using real neurons



from Hacker News https://ift.tt/2UQddhp

The end of an Era – changing every single instance of a 32-bit time_t in Linux


With the linux-5.6 merge window, a project ends that has kept me busy for nearly six years: preventing the “Epochalypse” by changing every single instance of a 32-bit time_t in the kernel to a type that does not roll over on 2038-01-19.

While both John Stultz and I had been thinking about and prototyping partial solutions even earlier, the year 2014 is when we started discussing more openly in Linaro and the wider kernel community about what needed to happen. In a team effort, John started rewriting the core timekeeping support of the kernel, working his way out, while I would work my way down from the outside, starting with file systems and then system calls and device drivers with the goal of getting this done by the end of the year.

Spreading the Load

As chronicled on lwn.net, it turned out to take a bit longer. In order to address over 1000 files referencing time_t, timeval or timespec as of linux-3.15, we recruited help from a number of places.

The Outreachy program was a great resource for getting a lot of simple changes in drivers done, while internship candidates learned about contributing to the mainline kernel. Tina Ruchandani was my first intern and contributed 25 patches for the y2038 work in 2014/2015. For the 2015/2016 round, Deepa Dinamani joined as the second Outreachy intern and ended up implementing some of the most important bits all the way until the end with hundreds of patch submissions.

Within Linaro’s Kernel Working Group, I assigned simple driver conversions to new assignees from member companies to get them started on contributing to the upstream kernel while getting the conversion done one driver at a time, before moving on to more review intensive work in the kernel. Baolin Wang worked on converting real-time clocks and the audio subsystem, Firoz Khan’s first contribution was to rewrite the system call tables across all CPU architectures and many others contributed to device drivers.

Yak Shaving

Usually, getting y2038 fixes included was really easy, as maintainers are generally happy to take an obviously correct bugfix that they don’t have to implement themselves. However, some cases turned out to be much more time and labor intensive than we had imagined.

Converting the VFS code to use 64-bit inode timestamps took countless rewrites of the same patches, first from me and then from Deepa who finally succeeded. We wanted to avoid having to do a “flag day” change, which is generally considered too invasive and risks introducing regressions, and we wanted to minimize the changes for existing 64-bit users and for existing 32-bit applications. Doing this step-by-step change however turned out to add a lot of complexity as well. In the end, Deepa worked out a process of many non-invasive changes over multiple merge windows, followed by an automated conversion using coccinelle. The same series also fixed unrelated issues in the way some file systems generated their timestamps which reviewers had complained about.

This is an effect that can be observed a lot in kernel development: when you work on a simple bugfix, there is a good chance that development or review finds a much larger issue that also wants to be addressed, at which point it becomes near impossible to get the simple change merged without also addressing the wider problem. Issues that we addressed along the way include:

  • Changing the time functions away from getnstimeofday() to ktime_get() and similar conversions addressed the bugs with leap seconds, with time going backwards from settimeofday() as well as some particularly inefficient code.
  • File system timestamps are now checked for overflow in a consistent way, and interpreted the same way on 32-bit and 64-bit architectures, extending the range to at least year 2106 where possible.
  • The system call tables are now generated from machine readable files, and all architectures support at least the set of standard system calls that are available to newly added architectures.
  • Converting all the architectures led to the decision to clean out architectures that are no longer actively used or maintained
  • David Howells contributed the statx() system call that solves passing 64-bit timestamps along with many other features that are not present in stat().
  • The handling for 32-bit compat tasks on 64-bit kernels is more consistent with the native system calls now, after a lot of the compat syscalls were rewritten to be shared with time32 support for 32-bit architectures. Most importantly, the compat ioctl() handling is now completely reworked and always handled by the driver rather than a centralized conversion function that easily gets out of sync.

Endgame

With all the VFS and system call changes out of the way during early 2019, the kernel was basically working, but a number of smaller issues still remained. In the summer I set out to make a list of everything that was still missing and revisited patches I had done in the previous years. Instead of creating the list I ended up writing the remaining ~100 patches: alsa and v4l2 were still lacking ABI changes, the NFS implementation and a few other file systems still needed changes, and there were still users referencing the time_t type. The resulting branch was basically ready for linux-5.4, and with the usual bug fixes and testing this has now all but made it into the ongoing linux-5.6 merge window. The last patch in the series hides the traditional time_t definition from kernel space and removes all the now unused helper functions that use it to prevent new references from getting merged.

Fixing User Space

After the time64 system call ABI was finalized in linux-5.1, work on using this in the C libraries got a lot more serious. The release of musl-1.2 is now imminent and will provide time64 for all newly compiled code. Adelie Linux is already migrating to this version and has a list of known issues. I expect the bugs to also get fixed in upstream projects soon. The first preview release of a time64 Adelie Linux is available for testing now. Most other distributions based on musl are likely to do the same conversion over the next months, depending on their release cycles.

For glibc, work is still ongoing, the plan at the moment is to move over to 64-bit time_t as an option in glibc-2.32 later this year. However, the default is still a 32-bit time_t, and as glibc based distributions tend to have a larger number of packages, there is a very significant effort in rebuilding everything in a coordinated way. Any library that exposes an interface based on time_t must be recompiled along with all applications and other libraries using this interface, so in the end the result is typically a completely incompatible distribution. The Debian “armhf” port for ARMv7 CPUs is an obvious candidate that will have to go through this transition, but I expect most of the other distributions on 32-bit CPUs to stay with 32-bit time_t and then stop support before this becomes a problem.

So far it is looking good for the distro port, as most of the y2038 problems have already been found by the various BSD Unixes that changed over years ago (thanks guys!), so a lot of the remaining problems are either Linux specific, or in applications that have never been ported to anything other than Linux. I expect that once we get into larger scale testing, we will find several sets of problems:

  • Bugs that got introduced by an incorrect conversion to the time64 interfaces, breaking existing source code regardless of the time_t definition, like the regressions that are inevitably caused by any larger change and hopefully found quickly. For instance, we broke the sparc architecture port multiple times, but then also found ancient sparc bugs from a previous large-scale change that are now fixed.
  • Problems of an incorrect or incomplete conversion, breaking 32-bit software after the conversion to 64-bit time_t, e.g. a format string printing a time_t as a ‘long’ type rather than a ‘long long’, software that mixes the libc data types with direct calls to low-level kernel interfaces like futex(), or source packages that contain outdated copies of kernel headers such as linux/input.h or sound/asound.h.
  • 32-bit software that works correctly with 64-bit time_t until 2038 but then still fails because of an incorrect truncation to a ‘long’ type when it defines its own types rather than using the ones from system headers.
  • Anything that uses fixed 32-bit representation for time_t values remains broken on both 32-bit and 64-bit applications. This often involves on-disk or over-the-wire data formats that are hard to change.

The biggest challenge will be to find and update all the devices that are already being deployed without the necessary bug fixes. The general move to 64-bit hardware even in deeply embedded systems helps ensure that most machines only run into the last set of problems, but 32-bit hardware will be deployed for many years to come, and will increasingly run on old software as fewer developers are motivated to work on them.



from Hacker News https://ift.tt/2Uw1MwJ

Writing HTML with Racket and X-Expressions (2019)

https://ift.tt/3dIsidq


from Hacker News https://ift.tt/32SseB7

Simdjson 0.3: Faster JSON parser

We released simdjson 0.3: the fastest JSON parser in the world is even better!

Last year (2019), we released the simjson library. It is a C++ library available under a liberal license (Apache) that can parse JSON documents very fast. How fast? We reach and exceed 3 gigabytes per second in many instances. It can also parse millions of small JSON documents per second.

The new version is much faster. How much faster? Last year, we could parse a file like simdjson at a speed of 2.0 GB/s, and then we reached 2.2 GB/s. We are now reaching 2.5 GB/s. Why go so fast? In comparison, a fast disk can reach  5 GB/s and the best network adapters are even faster.

The following plot presents the 2020 simdjson library (version 0.3) compared with the fastest standard compliant C++ JSON parsers (RapidJSON and sajson).

In this plot, RapidJSON and simjson have exact number parsing, while RapidJSON (fast float) and sajson use approximate number parsing. Furthermore, sajson has only partial unicode validation whereas other parsers offer exact encoding (UTF8) validation.

If we only improved the performance, it would already be amazing. But our new release pack a whole lot of improvements:

  1. Multi-Document Parsing: Read a bundle of JSON documents (ndjson) 2-4x faster than doing it individually.
  2. Simplified API: The API has been completely revamped for ease of use, including a new JSON navigation API and fluent support for error code and exception styles of error handling with a single API. In the past, using simdjson was a bit of a chore, the new approach is definitively modern, see for yourself:
    auto cars_json = R"( [
     { "make": "Toyota", "model": "Camry", "year": 2018, 
     "tire_pressure": [ 40.1, 39.9 ] },
     { "make": "Kia", "model": "Soul", "year": 2012, 
     "tire_pressure": [ 30.1, 31.0 ] },
     { "make": "Toyota", "model": "Tercel", "year": 1999, 
     "tire_pressure": [ 29.8, 30.0 ] }
     ] )"_padded;
     dom::parser parser;
     dom::array cars = parser.parse(cars_json).get<dom::array>();
     
     // Iterating through an array of objects
     for (dom::object car : cars) {
     // Accessing a field by name
     cout << "Make/Model: " << car["make"] 
     << "/" << car["model"] << endl;
     
     // Casting a JSON element to an integer
     uint64_t year = car["year"];
     cout << "- This car is " << 2020 - year 
     << "years old." << endl;
     
     // Iterating through an array of floats
     double total_tire_pressure = 0;
     for (double tire_pressure : car["tire_pressure"]) {
     total_tire_pressure += tire_pressure;
     }
     cout << "- Average tire pressure: " 
     << (total_tire_pressure / 4) << endl;
     
     // Writing out all the information about the car
     for (auto [key, value] : car) {
     cout << "- " << key << ": " << value << endl;
     }
     }
     
  3. Exact Float Parsing: simdjson parses floats flawlessly at high speed.
  4. Fallback implementation: simdjson now has a non-SIMD fallback implementation, and can run even on very old 64-bit machines. This means that you no longer need to check whether the system supports simdjson.
  5. Automatic allocation: as part of API simplification, the parser no longer has to be preallocated: it will adjust automatically when it encounters larger files.
  6. Runtime selection API: We have exposed simdjson’s runtime CPU detection and implementation selection as an API, so you can tell what implementation we detected and test with other implementations.
  7. Error handling your way: Whether you use exceptions or check error codes, simdjson lets you handle errors in your style. APIs that can fail return simdjson_result, letting you check the error code before using the result. But if you are more comfortable with exceptions, skip the error code
    and cast straight to the value you need, and exceptions will be thrown automatically if an error happens. Use the same API either way!
  8. Error chaining: We also worked to keep non-exception error-handling short and sweet. Instead of having to check the error code after every single operation, now you can chain JSON navigation calls like looking up an object field or array element, or casting to a string, so that you only have to check the error code once at the very end.
  9. We now have a dedicated web site (https://simdjson.org) in addition to the GitHub site (https://github.com/simdjson/simdjson).

Credit: many people contributed to simdjson, but John Keiser played a substantial role worthy of mention.



from Hacker News https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/

Omni Group Layoffs

Brent Simmons (tweet):

Omni’s been around for almost 30 years, and I hope it’s around for another 30. It’s one of the great Mac and iOS shops — they will sing songs about Omni, at maximum volume, in the great halls.

But businesses go up and down, and Omni’s had a bit of a down period. Normally that would be fine, but the current economic circumstances turn “a bit of a down period” into something more serious — and, in order to get things going the right way again, the company had to lay off some people. Including me.

For anyone who’s able to hire now, this is a rare opportunity to scoop up some top talent that’s usually off the market.

Mark Boszko (tweet):

People probably know me best for my video production work — please see the output of my last seven years in The Omni Group’s video archives — but I have also done a lot of related development work, and would love to push my career in that direction.

Joel Page:

In short, I designed applications for macOS and iOS. If you look at any of Omni’s applications, you’re looking at my work. Icons, UX, UI, but mostly the icons. I joke that being a UI designer is 95% being a production artist, and that holds pretty true.

Marcin Krzyzanowski:

I’m looking for new opportunity (yay!) I’ve been doing remote (EU and US) for many years. I’m seasoned iOS Developer, some Mac dev (I’m open to other tech). Interested in contract and/or fulltime.

[…]

due to layoffs in current startup

John Sundell:

Normally, this site (and all of my other work) is funded by sponsorships — through non-tracking, privacy-focused (and JavaScript-free) ads that I run on a weekly basis. But for the next two weeks there will be no ads on this site. Instead, each day, I’ll promote a new indie app whose developer has been financially impacted by the current pandemic. For free, with no strings attached.

I hope that, with your support, these indie developers will regain some of that lost revenue through this effort, and that we will all get to discover a few great new apps as well.

[…]

Also, I’d love to see you share your own favorite indie apps on Twitter and other social networks — and if you do, feel free to use the hashtag #IndieSupportWeeks to make those tweets and posts easier to find for everyone who’s following this effort.

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment



from Hacker News https://ift.tt/2WW3Z5S

Zoho on collecting customer data and avoiding the cookies trap


Zoho's chief strategy officer Vijay Sundaram is a firm believer that business growth should not come at the cost of customer privacy.

"20 years ago, we made a decision: We will never have a model that makes revenue out of advertising," he told ZDNet.

"What that means is that slows down your growth because you have obvious ways to add an additional revenue stream … but what that winds up doing is something that lasts over a longer period of time."

According to Sundaram, the exponential growth it can potentially offer could also result in conflicting models.

"You see a lot of technology companies today, and it's not just the Googles of the world, or the Facebook's of the world, and many technology companies, including Amazon and Microsoft and others that have an advertising model, however big or small it is," he said.

"There is a fundamental conflict in that model because ad revenue and privacy are fundamentally at loggerheads -- in a sense as the oil and water business -- they just don't mix."

See also: Zoho hits 50 million business users, launches WorkDrive  

Sundaram said that while there are advantages to driving ad revenue, it would loosen privacy requirements as there's a fundamental conflict. He said at minimum, more work is required to keep both separate to avoid breaches, but it only gets worse from there, he added.

"Marketing has never been a crime. Companies have marketed to their customers since God knows when, time immemorial, so what you have is -- how you use the marketing, how you use the tools," he said.

"What of those tools are made visible to your customers? How do people know how they're being observed, watched, or how their data is being used? ... That's where the moral part comes in.

"It's the analogy of … is a crime attributed to the gun or the gun holder? Marketing is a tool … what are your ethics and morals or governance is the bigger question."

Another decision taken by the Indian productivity giant is that it has banned third-party cookies.

"Third-party cookies are an evil of the internet age that people are simply not aware of," he said.

"There's a lot of 'value in collecting this data,' because there's a presumption ... that customer data is up and available for sale, despite all the regulatory regimes that are trying to clamp down on it."

By banning third-party cookies, Sundaram said Zoho has been able to make the statement to its customers that their data is their data.

"Our customers may choose to use our products and put in third-party cookies -- we don't have a right on that, and we can't mandate that," he added.

"From our point of view, every company should just do this. In fact, they should do this because it's a matter of principle. And most companies will only do it when they're mandated by the law."

Zoho, which offers a suite of enterprise applications that add up to be a business operating system, currently boasts a user base of over 50 million.  

RELATED COVERAGE



from Latest Topic for ZDNet in... https://ift.tt/2R3mwJP

Germany's Coronavirus Death Rate Is Lower Than in Other Countries


Young people gather in the Volkspark am Friedrichshain in Berlin on March 18. Germany's fatality rate so far — just 0.5% — is the world's lowest, by a long shot. Markus Schreiber/AP hide caption

toggle caption
Markus Schreiber/AP

Young people gather in the Volkspark am Friedrichshain in Berlin on March 18. Germany's fatality rate so far — just 0.5% — is the world's lowest, by a long shot.

Markus Schreiber/AP

As confirmed cases of the coronavirus in Germany soared past 10,000 last week, hundreds of Berliners crowded Volkspark am Friedrichshain to play soccer and basketball, and to let their kids loose on the park's many jungle gyms.

The conditions seemed ideal for the spread of a virus that has killed thousands. Indeed, as of Wednesday, Germany had the fifth-highest number of cases.

Yet Germany's fatality rate so far — just 0.5% — is the world's lowest, by a long shot.

"I believe that we are just testing much more than in other countries, and we are detecting our outbreak early," said Christian Drosten, director of the institute of virology at Berlin's Charité hospital.

As Europe has become the epicenter of the global coronavirus pandemic, Italy's fatality rate hovers around 10%. France's is around 5%. Yet Germany's fatality rate from COVID-19 has remained remarkably low since cases started showing up there more than a month ago. As of March 25, there were 175 deaths and 34,055 cases.

Drosten, whose team of researchers developed the first COVID-19 test used in the public domain, said Germany's low fatality rate is because of his country's ability to test early and often. He estimates Germany has been testing around 120,000 people a week for COVID-19 during the monthlong period from late February to now, when it's reached epidemic proportions in the country, the most extensive testing regimen in the world.

And that means Germany is more likely to have a lower number of undetected cases than other countries where testing is less prevalent, which raises the question: Why is Germany testing so much?

"We have a culture here in Germany that is actually not supporting a centralized diagnostic system," said Drosten, "so Germany does not have a public health laboratory that would restrict other labs from doing the tests. So we had an open market from the beginning."

In other words, Germany's equivalent to the U.S. Centers for Disease Control and Prevention — the Robert Koch Institute — makes recommendations but does not call the shots on testing for the entire country. Germany's 16 federal states make their own decisions on coronavirus testing because each of them is responsible for their own health care systems.

When Drosten's university medical center developed what became the test recommended by the World Health Organization, they rolled these tests out to their colleagues throughout Germany in January.

"And they of course rolled this out to labs they know in the periphery and to hospital labs in the area where they are situated," Drosten said. "This created a situation where, let's say, by the beginning or middle of February, testing was already in place, broadly."

Drosten said that has meant quicker, earlier and more widespread testing for COVID-19 in Germany than in other countries.

Lothar Wieler, head of the Robert Koch Institute, Germany's federal agency responsible for disease control and prevention, said at a news conference last week that Germany's testing infrastructure means authorities have a more accurate read of confirmed cases of the virus.

"We don't know exactly how many unknown cases there are, but we estimate that this unknown number is not very high," Wieler said. "The reason is simple. We issued recommendations in mid-January about who should be tested and who shouldn't be tested."

But some Berlin residents aren't as confident as Wieler. Nizana Nizzi Brautmann said she was worried when a teacher at her son's school tested positive for COVID-19 and a day later she and her son woke up with fevers and persistent coughing. She said she couldn't get through to Berlin's coronavirus hotline, which was continuously busy.

She finally got through to the city's emergency medical service number, "and I told her I think we need to be checked because we have some symptoms," Brautmann recalled. "The lady was just saying, 'We make no tests here. I can't help you. I would advise you to stay home and drink tea.' "

When she finally was able to speak to a doctor on the phone, the doctor told her to wait in line outside a local hospital to get tested, but she didn't have a mask for her or her son, and she didn't want to infect others in line, so she stayed home. She and her son are now in good health, but she said the episode left her wondering how prepared German society is for this pandemic.

Drosten said such experiences are probably an exception, not the rule.

"I know the diagnostics community in Germany a bit," Drosten said. "My feeling is that actually the supply of tests is still good. And of course, our epidemic is now also very much up-ramping and we will lose track here, too."

Drosten said the growing number of cases in Germany will soon exceed testing capacities. But for the time being, he thinks the country has had a robust response to the coronavirus pandemic. He's most worried about countries in Africa that aren't well set up for this — countries that, once the crisis comes to them, will find it more difficult to flatten the curve.

NPR Berlin bureau producer Esme Nicholson contributed to this story.



from Hacker News https://ift.tt/2wqsLAl

Unmasking Northrop Grumman's XRQ-72A


However, AFRL only requested a formal designation for the Great Horned Owl drone in January 2017, according to the documents we received via the FOIA request. "It is planned that the GHO [Great Horned Owl] demonstrator would complete a series of flights at the Acoustic Research Complex operated by AFRL/RH [Airman Systems Directorate] on the White Sands Missile Range incorporating the hybrid power and propulsion system, and then be available for future demonstrations of additional technology and/or vehicle configurations," the memorandum explained. 

What's also interesting is that AFRL asked for an X-plane designation, which are typically applied only to purely experimental aircraft. One notable exception was the Bell X-16, which was publicly described as a high-altitude testbed, but was, in fact, a competitor to the famous U-2 Dragon Lady spy plane.

"The X-## Mission Design Series Assignment is very important at this time to highlight to the aerospace R&D community the importance of the development of hybrid power and propulsion for future USAF aircraft," the memo noted. "While potential benefits to a broad range of future vehicles have been identified, such a dramatic change in the design paradigm is a slow process. The X-## designation for the GHO will dramatically improve awareness of the research and help preserve the technical achievements, increasing opportunity for technical transition of the experimental results and design philosophy to future USAF air vehicles."

It's not clear how the final XRQ-72A designation, which the Air Force approved in March 2017, came about. The "XRQ" prefix is very much not an X-plane designation and is more fitting for a prototype of a potentially operational platform. There numerous examples of experimental drones with "XQ" or just "X" prefixes, such as the Kratos XQ-58A Valkyrie and the Lockheed Martin X-44A, as well as the still curious YQ-11 prototype designation for the General Atomics Predator C/Avenger, so this appears to have been a deliberate decision. 

The number 72 is also wildly out of sequence for either the X-plane category or the drone category in the U.S. military's joint-service aircraft designation system. It is a number that has been informally applied on multiple occasions to advanced intelligence, surveillance, and reconnaissance aircraft designs in the past, as a reference to being a spiritual successor to the iconic SR-71 Blackbird spy plane.

In 2018, IARPA also initiated a follow-on project called Little Horned Owl, which had similar goals to Great Horned Owl, but called for a notably smaller overall design that would only have to carry a 10-pound payload. There was also a requirement for "innovative battery architecture to improve flight times for battery-only flight" and "runway independent operation (minimal ground support equipment)."

Regardless, as IARPA noted in 2011, there is a clear demand for the kind of technologies developed under the Great Horned Owl program. A drone with relatively long endurance and very quiet operation would enable persistent surveillance of targets of interest with a reduced likelihood of detection. 

This, of course, would hardly be the first time the U.S. Intelligence Community, or the U.S. military, has acquired or otherwise experimented with ultra-quiet aircraft for reconnaissance and covert operations. The CIA famously used a pair of Hughes 500P helicopters, also known as the "Quiet Ones," to conduct a top-secret wire-tapping operation in North Vietnam during the Vietnam War. The U.S. Navy and the U.S. Army also experimented with very-quiet manned surveillance aircraft, such as the YO-3A Quiet Star, during the Vietnam War. 

During the 1980s and 1990s, the Army, as well as the U.S. Air Force and the CIA, further pursued higher-flying manned surveillance aircraft with low acoustic signatures, most notably the RG-8A Condor powered glider, some of which later found their way into U.S. Coast Guard service. The Coast Guard went on to fly the unique RU-38A Twin Condor, as well. You can read more about these aircraft, and the Coast Guard's attempts to develop a successor, in this past War Zone piece

When it comes to more modern quiet drones, The Defense Advanced Research Projects Agency (DARPA) has been pursuing similar developments for U.S. military use. In 2006, Lockheed Martin also supplied a hand-launched, electrically-powered drone called Stalker to an unnamed customer, reportedly U.S. Special Operations Command, specifically to meet a requirement for a quieter unmanned aircraft that opponents would have a harder time detecting. More recently, the company developed a propeller-driven flying wing design, called Fury, which is visually similar to the RQ-170, albeit smaller, and that also has a reduced acoustic signature.



from Hacker News https://ift.tt/343nKtJ

Oil selling below $10/barrel at key American hubs


Oil selling below $10/bbl at key American hubs

By Sheela Tobben on 3/31/2020

NEW YORK (Bloomberg) --Oil is selling for less than $10 across key North American hubs as the global demand shock from coronavirus leaves crude with nowhere to go.

The coronavirus pandemic has hit demand so hard that as benchmark futures plunge to lowest in 18 years, oil is backing up throughout the distribution system, raising the prospect that producers will need to shut in wells. Some of the hardest-hit areas have been those thousands of miles from export terminals, which would provide the possibility of escape, either to foreign markets or onto tankers as floating storage.

Refiners across the U.S., including PBF Energy Inc., Valero Energy Corp. and Phillips 66, are slowing fuel production as restrictions on travel and work has reduced gasoline and jet fuel demand to a trickle. North Atlantic Refining Ltd will be idling its 130,000-barrel-a-day refinery in Newfoundland, Canada, for two to five months due to the outbreak.

The market is groaning under the weight of this oversupply so much so that U.S. midstream operators such as Plains All American Pipelines have asked their suppliers to reduce oil production because storage capacity is reaching its limits.

Bakken crude in Guernsey, Wyoming, sank to a record-low $3.18 a barrel Monday, according to data compiled by Bloomberg, while Western Canadian Select in Hardisty, Alberta, was worth just $4.18. Even oil in West Texas is as cheap as it’s ever been. West Texas Intermediate in Midland was $10.68, just above its all-time low from 1998. And it’s lower-quality counterpart, West Texas Sour, slid to a record $7.18, the lowest in data going back to 1988.

West Texas Intermediate Light, also known as WTL, traded at around $7.50 a barrel below the WTI Midland benchmark on Monday, traders said, the equivalent of about $3 a barrel outright. Including transportation costs from the wellhead, that would mean the very light crude is worth near-zero, if not negative, when it comes out of the ground.

Even oil that makes it to a dock isn’t immune from the price plunge, as refineries around the world slow down. U.S. oil for export from Corpus Christi -- the end point of several new Permian pipelines and a major exporting hub -- traded at $15 a barrel below July Brent, according to traders.

Related News ///

FROM THE ARCHIVE ///



from Hacker News https://ift.tt/39wSe8K

Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

1 INTRODUCTION

The classical data structure for approximate membership is the Bloom filter [4]. It may be the best-known probabilistic data structure. A Bloom filter is akin to a set data structure in that we can add keys and check whether a given key is present in the set. There is a small probability that a key is incorrectly reported as being present, an event we call a false positive. However, Bloom filters can use less memory than the original set. Thus, Bloom filters accept a small probability of error for a reduced memory usage.

Approximate set membership has many applications: e.g., scanning for viruses using payload signatures [18], filtering bad keywords or addresses, and fast language identification for strings [22]. Write-optimized key-value stores [11] such as log-structured merge (LSM) trees [29] are another important use case. In such stores, an in-memory data structure avoids expensive disk accesses.

We want our data structures to be fast and to use little memory. In this respect, conventional Bloom filters can be surpassed:

  • Bloom filters generate many random-access queries. For efficient memory usage, a Bloom filter with a false-positive probability $\epsilon$ should use about $-\log _2 \epsilon$ hash functions [10]. At a false-positive probability of 1%, seven hash functions are thus required. Even if the computation of the hash functions were free, doing many random memory accesses can be expensive.
  • The theoretical lower bound for an approximate membership data structure with a false-positive probability $\epsilon$ is $-\log _2 \epsilon$ bits per key [10]. When applied in an optimal manner, Bloom filters use 44% more memory than the theoretical lower bound.

Practically, Bloom filters are often slower and larger than alternatives such as cuckoo filters [20]. Can we do better than even cuckoo filters?

Bonomi et al. [5] as well as Broder and Mitzenmacher [10] remarked that for static sets, essentially optimal memory usage is possible using a perfect hash function and fingerprints. They dismissed this possibility, in part, because perfect hash functions might be too expensive to compute. Yet, Dietzfelbinger and Pagh [17] described a seemingly practical implementation of this idea, which we call an xor filter. It builds on closely related work such as Bloomier filters [12, 13].

To our knowledge, xor filters were never implemented and benchmarked. We present the first experimental evaluation. We find that they perform well, being often faster than both Bloom and cuckoo filters. For common use cases, they require less memory. Furthermore, we can improve their memory usage with only a modest performance penalty, using a relatively simple compression technique (see Section 3.3). We make our software freely available to ensure reproducibility.

Our main result is that xor filters have merit as a practical data structure. They are fast, compact, and we found them easy to implement.

2 RELATED WORK

We find many Bloom filters and related data structures within database systems [11] to avoid disk accesses. A popular strategy for designing database engines that must support frequent updates is the log-structured merge (LSM) tree [29]. At a high-level, LSM trees maintain a fast in-memory component that is merged, in batches, to data in persistent storage. The in-memory component accumulates database updates thus amortizing the update cost to persistent storage. To accelerate lookups, many LSM tree implementations (e.g., levelDB, RocksDB, WiredTiger) use Bloom filters. When merging the components, usually a new filter is built. We could, instead, update existing filters. However, data structures that support fast merging (e.g., Bloom filters) require either the original filters to have extra capacity, or the result of the merger to have higher false-positive probabilities [2].

Many applications of Bloom filters and related data structures are found in networking, where we seek to avoid unnecessary network access. Generally, whenever a filter must be sent through a network connection to other computers (e.g., to cache and prevent network queries), we might be able to consider the filter as immutable [27] on the receiving machine.

2.1 Bloom Filter Variants

Standard Bloom filters [4] consist of a collection of hash functions $h_1$, $h_2,\ldots, h_k$, which map each possible key to a fixed integer, which we interpret as an index value, and an array of bits $B$, initialized with zeros. The size of the array and the number of hash functions $k$ are parameters of the filter. When we add a key $x$, we hash it with each hash function, and set the corresponding bits:

\begin{eqnarray*} B[{h_1(x)}] & \leftarrow 1,\\ B[{h_2(x)}] & \leftarrow 1,\\ & \vdots \\ B[{h_k(x)}] & \leftarrow 1. \end{eqnarray*}

To determine whether a given key is likely present, we check that the corresponding bits in our array are set:

\begin{eqnarray*} (B[{h_1(x)}] = 1) \mathrm{\,and\,} (B[{h_2(x)}] = 1) \mathrm{\,and\,} \cdots \mathrm{\,and\,} (B[{h_k(x)}] = 1). \end{eqnarray*}

Thus, if there are $k$ hash functions, then we might need to check up to $k$ bits. For keys that were added, we are guaranteed that all bits are set: there can never be a false negative. But false positives are possible, if the bits were set by other keys. The standard Bloom filter does not allow us to remove keys. Bloom filters support adding keys irrespective of the size of the bit array and of the number of hash functions, but the false-positive probability increases as more entries are added, and so more bits are set.

The size of the array $B$ is typically chosen so that a certain false-positive probability can be guaranteed up to a maximal number of entries, and the optimal parameter $k$ is calculated. The expected space overhead for optimal Bloom filters is 44%: it requires setting $k = - \log _2 \epsilon$ where $\epsilon$ is the desired bound on the false-positive probability. Bloom filters can be made concurrent [39].

Blocked Bloom filters [24, 35] consist of many small Bloom filters, maybe one per CPU cache line, so that they need only one memory access per operation. However, the load of those small filters is likely to be uneven, and so for the same false-positive probability, they often need about 30% more space than standard Bloom filters. Advanced CPU instructions allow to speed up membership tests for both regular and blocked Bloom filters [32].

There are many other variations on Bloom filters including counting Bloom filters [5, 36], which support removing keys at the expense of more storage, compressed Bloom filters [27], multidimensional Bloom filters [14], Stable Bloom filters [15], and so forth.

2.2 Fingerprint-based Variants

Fingerprint-based variants store a fingerprint per key, where a fingerprint is the result of hash function $h$; typically, it is a word having a fixed number of bits. The membership test consists of the retrieval and comparison with the relevant fingerprints for the given key. The general intuition is as follows. For each value $x$ in the set, we store the fingerprint $h(x)$ in a key-fingerprint data structure. Given a candidate value $y$, we access its fingerprint from the data structure, and we compare the result with $h(y)$. Whenever $y$ was part of the set, the fingerprints match, otherwise they are likely different with a probability that depends on the size of the fingerprint.

  • Golomb-compressed sequences [35] store the sorted fingerprints by encoding the differences between fingerprint values. The overhead of this encoding is at least 1.5 bits per key, but it is difficult to achieve competitive speed.
  • Cuckoo filters [20] are based on cuckoo hashing. At full capacity, and with a low false-positive probability, they use less space than Bloom filters, and membership tests are often faster. The overhead is 3 bits per key for the standard cuckoo filter, and 2 bits per key for the slower semi-sorted variant. We are not aware of a cuckoo filter implementation that supports concurrent updates though there are related cuckoo hashing concurrency strategies [26].
  • Quotient filters [31] store fingerprints in a compact hash table. Quotient filters and cuckoo filters use a similar amount of memory.
  • Morton filters [8] are similar to cuckoo filters, but use underloaded buckets, like Horton tables [9]. Many sparse buckets are combined into a block so that data is stored more densely.
  • Bloomier filters [12, 13] support approximate evaluation of arbitrary functions, in addition to approximate membership queries. We are interested in a variant of the Bloomier filter [17] that can be used for approximate membership queries. We call this variant the xor filter (Section 3).

Other variants have been proposed [33, 40], but authors sometimes omit to provide and benchmark practical implementations. Dietzfelbinger and Pagh [17] observe that fingerprint techniques can be extended by storing auxiliary data with the fingerprint.

3 XOR FILTERS

Given a key $x$, we produce its $k$-bit fingerprint (noted $\operatorname{fingerprint}(x)$) using a randomly chosen hash function. We assume an idealized fully independent hash function; all fingerprints are equally likely so that $P(\operatorname{fingerprint}(x) = c) = 1/2^k$ for any $x$ and $c$. This probability $\epsilon = 1/2^k$ determines the false-positive probability of our filter. We summarize our notation in Table 1.

Table 1. Notation

$U$ Universe of all possible elements (e.g., all strings)
$S$ Set of elements from universe $U$ (also called “keys”)
$|S|$ Cardinality of the set $S$
$B$ Array of $k$-bit values
$c=|B|$ Size (or capacity) of the array $B$, we set $c = \lfloor 1.23 \cdot |S| \rfloor + 32$
$\operatorname{fingerprint}$ Random hash function mapping elements of $U$ to $k$-bit values (integers in $[0,2^k)$)
$h_0, h_1, h_2$ Hash functions from $U$ to integers in $[0,\lfloor c/3\rfloor)$, $[\lfloor c/3\rfloor ,\lfloor 2c/3\rfloor)$, $[\lfloor 2c/3 \rfloor ,c),$ respectively
$x \operatorname{\mathrm{\,xor\,}}y$ Bitwise exclusive-or between two values
$B[i]$ $k$-bit values at index $i$ (indexes start at zero)
$\epsilon$ False-positive probability

We want to construct a map $F$ from all possible elements to $k$-bit integers such that it maps all keys $y$ from a set $S$ to their $k$-bit $\operatorname{fingerprint}(x)$. Thus, if we pick any element of the set, then it gets mapped to its fingerprint by design $F(y)=\operatorname{fingerprint}(y)$. Any value that is not part of the filter gets mapped to a value distinct from its fingerprint with a probability $1-\epsilon = 1- 1/2^k$.

We store the fingerprints in an array $B$ with capacity $c$ slightly larger than the cardinality of the set $|S|$ (i.e., $c \approx 1.23 \times |S|$). We randomly and independently choose three hash functions $h_0, h_1, h_2$ from $U$ to consecutive ranges of integer values ($h_0: S \rightarrow \lbrace 0,\ldots , c/3 - 1\rbrace $, $h_1: S \rightarrow \lbrace c/3,\ldots ,2c/3 - 1\rbrace $, $h_2: S \rightarrow \lbrace 2c/3,\ldots ,c-1\rbrace $). For example, if $c=12$, then we might have the ranges $\lbrace 0,\ldots , 3\rbrace$, $\lbrace 4,\ldots , 7\rbrace$, and $\lbrace 8,\ldots ,11\rbrace$. Our goal is to have that the exclusive-or aggregate of the values in array $B$ at the locations given by the three hash functions agree with the fingerprint ($B[{h_0(x)}] \operatorname{\mathrm{\,xor\,}}B[{h_1(x)}] \operatorname{\mathrm{\,xor\,}}B[{h_2(x)}] = \operatorname{fingerprint}(x)$) for all elements $x \in S$. The hash functions $h_0, h_1, h_2$ are assumed to be independent from the hash function used for the fingerprint.

3.1 Membership Tests

The membership-test function (Algorithm 1) calculates the hash functions $h_0, h_1, h_2$, then constructs the expected fingerprint from those entries in table $B$, and compares it against the fingerprint of the given key. If the key is in the set, then the table contains the fingerprint and so it matches.

The processing time includes the computation of three hash functions as well as three random memory accesses. Though other related data structures may need fewer memory accesses, most modern processors can issue more than three memory accesses concurrently thanks to memory-level parallelism [1, 23, 34]. Hence, we should not expect the processing time to increase directly with the number of memory accesses.

3.2 Construction

The construction follows the algorithm from Botelho et al. [6] to build acyclic 3-partite random hypergraphs. We apply Algorithm 2, which calls Algorithm 3 one or more times until it succeeds, passing randomly chosen hash functions $h_0, h_1, h_2$ with each call. In practice, we pick hash functions by generating a new pseudo-random seed. Finally, we apply Algorithm 4.

Algorithm 3 works as follows. We initialize a (temporary) array $H$ of sets of keys of size $\lfloor 1.23 \cdot |S| \rfloor + 32$. At the beginning, all sets are empty. Then, we take each key $x$ from the set $S$, and we hash it three times ($h_0(x), h_1(x), h_2(x)$). We append the key $x$ to the three sets indicated by the three hash values (sets $H[h_0(x)], H[h_1(x)], H[h_2(x)]$). Most sets in the table $H$ contain multiple keys, but almost surely some contain exactly one key. We keep track of the sets containing just one key. Repeatedly, we pick one such location, append it to the output stack together with the key $x$ it contains; each time we remove the key $x$ from its three locations ($h_0(x), h_1(x), h_2(x)$). The process either terminates with a stack containing all of the keys, in which case we have a success, or with a failure.

The probability of success approaches 100% if the set is large [28]. For sets of size $10^7$, Botelho et al. [6] found that the probability is almost 1. For smaller sets, we experimentally found that the estimated probability is always greater than 0.8 with $c = 1.23 \cdot |S| + 32$, as shown in Figure 1.

Fig. 1.

Fig. 1. Probability of mapping step, found experimentally with 1,000 randomly generated sets.

Algorithm 3 runs in linear time with respect to the size of the input set $S$ as long as adding and removing a key $x$ from a set in $H$ is done in constant time. Indeed, each key $x$ of $S$ is initially added to three sets in $H$ and removed at most once from the same three sets.

In practice, if the keys in $S$ are integer values or other fixed-length objects, then we can implement the sets using an integer-value counter and a fixed-length mask (both initialized with zeros). When adding a key, we increment the counter and compute the exclusive-or of the key with the mask, storing the result as the new mask. We similarly remove a key by decrementing the counter and computing the same exclusive-or. Even when the set is made of large or variable-length elements, it may still be practical to represent them as small fixed-length (e.g., 64-bit or 128-bit) integers by hashing: it only comes at the cost of introducing a small error when two hash values collide, an improbable event that may only minutely increase the false-probability probability.

We find it interesting to consider the second part of Algorithm 3 when it succeeds. We iteratively empty the queue $Q$, one element at a time. At iteration $t$, we add the key $x$ and the corresponding index $i$ to the stack if $x$ is the single key of set $H[i]$, and we remove the key $x$ from the sets at locations $h_0(x), h_1(x), h_2(x)$. Hence, by construction, each time Algorithm 3 adds a key $x$ and an index $i$ to the stack, the index $i$ is different from indexes $h_0(x^{\prime }), h_1(x^{\prime }), h_2(x^{\prime })$ for all keys $x^{\prime }$ encountered later (at time $t^{\prime }\gt t$).

To construct the xor filter, we allocate an array $B$ large enough to store $\lfloor 1.23 \cdot |S| \rfloor + 32$ fingerprints. We iterate over the keys and their indexes in the reverse order, compared to how they were identified in the “Mapping Step” (Algorithm 3). For each key, there are three corresponding locations $h_0(x), h_1(x), h_2(x)$ in the table $B$; the index associated with the key is one of $h_0(x), h_1(x), h_2(x)$. We set the value of $B[i]$ so that $B[h_0(x)] \operatorname{\mathrm{\,xor\,}}B[h_1(x)] \operatorname{\mathrm{\,xor\,}}B[h_2(x)] = \operatorname{fingerprint}(x)$. We repeat this for each key. Each key is processed once.

By our construction, an entry in $B$ is modified at most once. After we modify an entry $B[i]$, then none of the values $B[h_0(x)]$, $B[h_1(x)]$, $B[h_2(x)]$ will ever be modified again. This follows by our argument where we work through Algorithm 3 in reverse: $i$ is different from $h_0(x^{\prime }), h_1(x^{\prime }), h_2(x^{\prime })$ for all keys $x^{\prime }$ encountered so far. Remember that we use a stack, so the last entry added to the stack in Algorithm 3 is removed first in Algorithm 4. Thus, our construction is correct: We have that

\begin{align*} B[h_0(x)] \operatorname{\mathrm{\,xor\,}}B[h_1(x)] \operatorname{\mathrm{\,xor\,}}B[h_2(x)] = \operatorname{fingerprint}(x) \end{align*}

for all keys $x$ in $S$ at the end of Algorithm 4.

3.3 Space Optimization: Xor+ Filter

About 19% of the entries in table $B$ are empty: For each 100 keys, we need 123 entries, and 23 are empty. For transmission, much of this empty space can be saved as follows: Before sending $B$, send a bit array that contains “0” for empty entries and “1” for occupied entries. Then, we only send the data of the occupied entries. If we use $k=8$ bits, then the regular xor filter needs $8 \times 1.23 = 9.84$ bits per entry, which we can compress in this way to $ 8 + 1.23 = 9.23$ bits per entry. If space usage at runtime is more important than query speed, then compression can be used at runtime. We can get a constant time access using a rank data structure such as Rank9 [38], at the expense of a small storage overhead ($\approx$25%), or poppy [41] for an even smaller overhead ($\approx$3%) at the expense of some speed.

By changing the construction algorithm slightly, we can move most of the empty entries to the last third of the table $B$. To do so, we change the mapping algorithm so that three queues are used instead of one: one for each hash function—each hash function represents a third of the table $B$. We then process entries of the first two queues until those are empty, before we process entries from the third queue. Experimentally, we find that 36% of the entries in the last third of table $B$ are empty on average. If the rank data structure is then only constructed for this part of the table, then space can be saved without affecting the membership-test performance as much, as only one rank operation is needed. We refer to this algorithm as “xor+ filter,” using Rank9 as the default rank data structure. With the fingerprint size in bits $k$, it needs $k \times 1.23 \times 2/3$ bits per key for the first two thirds of the table $B$, $k \times 1.23 \times 1/3 \times (1-0.36)$ for the last third, plus $1.23 \times 1/3 \times 1.25$ for the Rank9 data structure. In summary, xor+ filters use $1.0824 k + 0.5125$ bits per entry as opposed to $1.23 k$ bit per entry for xor filters.

3.4 Space Comparison

We compare the space usage of some of the most important filters in Figure 2. Bloom filters are more space efficient than cuckoo filters at a false-positive probability of 0.4% or higher.

Fig. 2.

Fig. 2. Theoretical memory usage for Bloom filters (optimized for space), cuckoo filter (at max. capacity), and xor filters given a desired bound on the false-positive probability.

For very low false-positive probabilities ($5.6 \times 10^{-6}$), cuckoo filters at full capacity use less space than xor filters. However, we are not aware of any system that uses such a low false-positive probability: most systems seem to use between 8 and 20 bits per key [16, 37]. Thus, we expect xor and xor+ filters to use less memory in practice.

4 EXPERIMENTS

We follow Fan et al.’s testing procedure [20]; we started from their software project [19]. Like them, we use 64-bit keys as set elements. We build a filter based on a set of 10M or 100M keys. We build a distinct set made of 10M queried keys. This set of queried keys is created by mixing some of the keys from the original set, and some keys not present in the original set. We use different fractions (e.g., 0%, 25%, 50%, 75%, and 100%) of the keys in the original set. The benchmark counts the number of queried keys that are possibly in the set according to the filter. The benchmark is single threaded and calls the membership-test functions with different keys in a loop. We disable inlining of the functions to prevent compilers from unduly optimizing the benchmark, which counts the number of matching keys.

We run benchmarks on Intel processors with Skylake microarchitecture: an Intel i7-6700 processor running at 3.4 GHz, with 8 MB of L3 cache. The software is compiled with the GNU GCC 8.1.0 compiler to a 64-bit Linux executable with the flags -O3 -march=native. For each filter, we run three tests and report the median. Our error margin is less than 3%. The C++ source code of the filter implementations and the benchmark is available. For some algorithms including all the xor and xor+ filters, we have also implemented Java versions2 and well as a Go version3 and a pure C version,4 but the benchmarks are using C++.

For all implementations, we use a randomly seeded Murmur finalizer [21] to compute the fingerprint from the key, as described in Algorithm 5. We choose this option instead of faster alternatives so that even non-random keys work well and do not result in higher-than-expected false-positive probabilities, or construction failure in the case of the cuckoo filter. For our tests, we use pseudo-randomly generated keys; we also tested with sequentially generated keys and found no statistically significant difference compared to using random keys after introducing the Murmur finalizer.

All implementations need to reduce a hash value $x$ to the range $\lbrace 0,\ldots ,m - 1\rbrace ,$ where $m$ is not necessarily a power of two. Where this is needed, we do not use the relatively slow modulo operation $x \mod {m}$ for performance reasons. Instead, starting with 32-bit values $x$ and $m$ and computing their full 64-bit product $x \times m$, we use the faster multiply-shift combination $(x \times m) \div {2^{32}} = (x \times m) \mathrm{\,\gt \gt \,} 32$ [25].

4.1 Filter Implementations

We run tests against the following filters:

  • Bloom filter: We implemented the standard Bloom filter algorithm with configurable false-positive probability (FPP) and size. We test with 8, 12, and 16 bits per key, and the respective number of hash functions $k$ that are needed for the lowest false-positive probability. For fast construction and membership test, we hash only once with a 64-bit function, treated as two 32-bit values $h_1(k)$ and $h_2(k)$. The Bloom filter hash functions are $g_i(k) = h_1(k) +i \cdot h_2(k)$ for $i=0,\ldots ,k-1$.
  • Blocked Bloom filter: We use a highly optimized blocked Bloom filter from Apache Impala,5 which is also used in the cuckoo filter software project [19]. We modified it so the size is flexible and not restricted to $2^n$. It is designed for Intel AVX2 256-bit operations; it is written using low-level Intel intrinsic functions. The advantage of this algorithm is the membership-test speed: each membership test is resolved from one cache line only using few instructions. The main disadvantage is that it is larger than regular Bloom filters.
  • Cuckoo filter (C): We started with the cuckoo filter implementation from the original authors [19]. We reduce the maximum load from 0.96 to 0.94, as otherwise construction occasionally fails. The reduced maximum load is apparently the recommended workaround suggested by the cuckoo filter authors. Though it is outside our scope to evaluate whether it is always a reliable fix, it was sufficient in our case. This reduction of the maximum load slightly worsens ($\approx$2%) the memory usage of cuckoo filters. In the original reference implementation [20], the size of the filter is restricted to be a power of two, which means up to 50% of the space is unused. Wasting so much space seems problematic, especially since it does not improve the false-positive probability. Therefore, we modified it so the size is flexible and not restricted to $2^n$. This required us to slightly change the calculation for the alternate location $l_2(x)$ for a key $x$ from the first location $l_1(x)$ and the fingerprint $f(x)$. Instead of $l_2(x) = l_1(x) \operatorname{\mathrm{\,xor\,}}h(f(x))$ as in Fan et al. [20], we use $l_2(x) = \mathrm{bucketCount} - l_1(x) - h(f(x))$, and if the result is negative, then we add $\mathrm{bucketCount}$. We use 12-bit and 16-bit fingerprints.
  • Cuckoo semi-sorted (Css): We use the semi-sorted cuckoo filter reference implementation, modified in the same way as the regular cuckoo filter. From the original Fan et al. [20] source release, we could only get one variant to work correctly, the version with a fingerprint size of 13 bits. Other versions have a non-zero false negative probability.
  • Golomb-compressed sequence (GCS): Our implementation uses an average bucket size of 16, and Golomb Rice coding. We use a fingerprint size of 8 bits.
  • Xor: Our xor and xor+ filters as described in Section 3. We use 8-bit and 16-bit fingerprints.

4.2 Construction Performance

We present the construction times for 10M and 100M keys in Table 2. All construction algorithms are single-threaded; we did not investigate multi-threaded construction. For reference, we also present the time needed to sort the 64-bit keys using the C++ standard sorting algorithm (std::sort), on the same platform.

Table 2. Construction Time in Nanoseconds Per Key, Rounded to 10 Nanoseconds

Algorithm 10M keys 100M keys
Blocked Bloom 10 ns/key 20 ns/key
Bloom 8 40 ns/key 70 ns/key
Bloom 12 60 ns/key 90 ns/key
Bloom 16 90 ns/key 130 ns/key
Cuckoo semiSort 13 130 ns/key 200 ns/key
Cuckoo 12 80 ns/key 130 ns/key
Cuckoo 16 90 ns/key 120 ns/key
GCS 160 ns/key 190 ns/key
Xor 8 110 ns/key 130 ns/key
Xor 16 120 ns/key 130 ns/key
Xor+ 8 160 ns/key 180 ns/key
Xor+ 16 160 ns/key 180 ns/key
(Sorting the keys) 80 ns/key 90 ns/key

During construction, the blocked Bloom filter is clearly the fastest data structure. For the 100M case, the semi-sorted variant of the cuckoo filter is the slowest. Construction of the xor filter with our implementation is roughly half as fast as the cuckoo filter and the Bloom filter, which have similar performance.

4.3 Query Time Versus Space Overhead

We present the performance numbers for the case where 25% of the searched entries are in the set in Figure 3, and in the case where all searched entries are in the set in Figure 4. The results are presented in tabular form in Table 3, where we include the Golomb-compressed sequence.

Fig. 3.

Fig. 3. Query time vs. space overhead, 25% find.


Fig. 4.

Fig. 4. Query time vs. space overhead, 100% find.

Table 3. Membership-test Benchmark Results, 25% Find

(a) 10M keys (b) 100M keys
Name Time (ns) Bits/key FPP Time (ns) Bits/key FPP
Blocked Bloom  16 10.7 0.939  20 10.7 0.941
Bloom 8  31  8.0 2.161  53  8.0 2.205
Bloom 12  40 12.0 0.313  58 12.0 0.339
Bloom 16  48 16.0 0.046  68 16.0 0.053
Cuckoo semiSort 13  57 12.8 0.092  94 12.8 0.092
Cuckoo 12  31 12.8 0.183  38 12.8 0.184
Cuckoo 16  32 17.0 0.012  37 17.0 0.011
GCS 137 10.0 0.389 220 10.0 0.390
Xor 8  23 9.8 0.389  32  9.8 0.391
Xor 16  27 19.7 0.002  33 19.7 0.001
Xor+ 8  36 9.2 0.390  64  9.2 0.389
Xor+ 16  43 17.8 0.002  65 17.8 0.002

Timings are in nanosecond per query.

Unlike xor and cuckoo filters, the Bloom filter membership-test timings are sensitive to the fraction of keys present in the set. When an entry is not in the set, only a few bits need to be accessed, until the query function finds an unset bit and returns. The Bloom filter is slower if an entry exists in the set, as it has to check all bits; this is especially the case for low false-positive probabilities. See Figure 4.

Ignoring query time, Figure 5 shows that Cuckoo 12 (C12) has memory usage that is close to Bloom filters. The cuckoo filter only uses much less space than Bloom filters for false-positive probabilities well below 1% (Cuckoo 16 or C16). In our experiments, the cuckoo filter, and the slower semi-sorted cuckoo filter (Css), always use more space than the xor filter. These experimental results match the theoretical results presented in Figure 2.

Fig. 5.

Fig. 5. FPP vs. space usage in bits/key, log scale FPP.

The xor filter provides good query-time performance while using little space, even for moderate false-positive probabilities.

4.4 Discussion

We attribute the good membership-test performance of xor filters mainly to the following reasons. Xor filters use exactly three memory accesses, independent of the false-positive probability. These memory accesses can be executed in parallel by the memory subsystem. The number of instructions, meanwhile, is small and there are no branches.

For a false-positive probability of 1%, the standard Bloom filter needs more memory accesses for a match, and even more so for lower false-positive probabilities. The Bloom filter uses between 41 and 105 instructions per key, depending on the number of set bits set and false-positive probability. For a miss (if the key is not in the set), on average fewer memory accesses are needed, but there might be mispredicted branches with accompanying penalties.

The cuckoo filter uses exactly two memory accesses, and 66 to 68 instructions per key (depending on fingerprint size). The xor filter uses exactly three memory accesses, but only about 48 instructions per key. Processors execute complex machine instructions using low-level instructions called $\mu$ops. A processor like our Skylake can support up to 10 outstanding memory requests per core, limited by an instruction reorder buffer of 200 $\mu$ops. In the absence of mispredicted branches and long dependency chains, the capacity of the instruction buffer becomes a limitation [3]. It is likely the reason why the cuckoo filter and the xor filter have similar membership-test performance. That is, while the cuckoo filter has fewer memory accesses, it generates more instructions which makes it harder for the processor to fetch as many memory requests as it could.

In our benchmarks, the blocked Bloom filter is the only algorithm that is clearly faster than the xor filter. This is most likely due to only having one memory access, and highly optimized code, using SIMD instructions specific to recent x64 processors. It needs fewer memory accesses and fewer instructions than its competitors. It might be difficult to implement a similarly efficient approach in a higher-level language like Java, or using solely portable code. If memory usage or low false-positive probability are a primary concern, then the blocked Bloom filter may not be a good choice.

While an xor filter is immutable, we believe that it is not a limitation for many important applications; competitive alternatives all have limited mutability in any case. Approximate filters that support fast mergers or additions (e.g., Bloom filters) require the original filters to have extra capacity. The update may even fail in the case of Cuckoo filters. Re-building the filter can maintain an optimal size. In multithreaded systems, immutability avoids the overhead of synchronization mechanisms to maintain concurrency.



from Hacker News https://ift.tt/2WZzgoC

Planning and Managing Layoffs


Nobody ever wants to do layoffs. Laying your employees off is one of the hardest things you do as a leader. But remember, however badly you feel about it, your issues are small compared to those of the impacted employees. For them, the layoff may cause serious financial and psychological distress. It will also force them into a wrenching emotional disconnect from their friends and colleagues. You think you have it tough, but they have it far tougher. Your duty as a leader is to do everything in your power to give them as many resources as you can and offer them the most dignified exit possible. This will take careful thought and planning, and it may be the most important planning you ever do as CEO.

Since most people have no idea how to do layoffs well, nor have a resource to help them, this document now exists. I’ve unfortunately had to do layoffs, and helped others through them, so I want to share some best practices and things to think through as you do a layoff. I’m starting from the point that you’ve already done everything possible to avoid a layoff and you’ve gone through the difficult process of figuring out what roles to eliminate. (I say what roles, not who, because that’s how you should be thinking about this. It’s not a person by person decision, it’s a role by role decision.)

Also note: I am not a lawyer. I am not your HR person. And as I point out below, you absolutely must consult experienced counsel and HR professionals in this process.

Properly executing a layoff is important in a few dimensions: legally, culturally, and ethically. How you handle a layoff, how you communicate it to the people impacted, and how you manage and lead throughout the process really matters. It matters to the people who are now leaving the business, and it matters to the team that is part of the go-forward plan, and neither group will forget how people were treated.

Points to consider as you plan your layoffs:

Consider a furlough (leave of absence): Depending on your company size, cash management requirements, and benefit plan policies, you may be able to put employees on furlough for three to six to 12 months where their salary (and potentially vesting) stops, but benefits continue. This gives the company time to stabilize finances and re-open the furloughed role, and/or allow the impacted employee ample time to find a new job while dramatically reducing the costs to the employer. If you can do this, it’s a huge benefit to the impacted employee. The employee is afforded the optics of still being employed while they conduct a search for a new role (which helps in their search), and they get continued benefits coverage, which is often critical. It also emphasizes the point that this is really a cost-savings decision to save the company and not a performance-related decision. 

Think about health benefits 

Time it right: If coverage in your employee health benefits plan ends on the last day of the month (most, but not all, do), and you are executing your layoff in the last ⅓ of the month, consider designating the last day of employment for all impacted employees to be the first day of the next month. That gives every impacted employee 30 days to figure out COBRA or get added to a partner’s health plan. Leaving employees in the lurch at the end of the month or without any continuing benefits coverage is totally avoidable, so avoid it.

Treat your employees like adults: I’ve heard many CEOs who want to offer their employees two weeks of severance but offer three months of coverage for health benefits. This well-intentioned (albeit paternalistic) thinking is wrong. In addition to creating logistical problems, you have no idea what your employees’ actual life situations are. Maybe they can get on a spouse’s benefits plan, maybe they are young enough to rejoin their parents’, maybe they will return to a home country with alternative healthcare options. Just pay them the cash and let them decide how to manage their own affairs.

Think about severance: Be consistent in your severance policy. Often, severance amounts will be based on tenure (much preferable to basing it on seniority). A common model employers will consider is paying two to four weeks for every year of service, with a maximum of four or eight weeks in total. Many companies won’t be able to afford to part with that much cash (especially all at once), and two or four weeks may be all they can do. Offering less than two weeks severance pay is outside the norm, as is offering more than eight weeks.

Timing

Do it once: If you’ve never conducted a layoff, your first instinct is to lay off as few people as possible. But anyone who has ever done layoffs as a company-saving measure will tell you they wish they had cut deeper. If you’re going to take on the massive emotional and cultural impact of letting people go, be sure to create sufficient savings so you and the remaining team have the cash required to get through to the other side of the crisis and survive. Doing multiple rounds of layoffs demoralizes your team and erodes any trust and confidence they have in you.

Are you notifying them or are they leaving that day: For less experienced leaders managing through this, I recommend the date of notification be the last day the employee has access to most workplace resources including offices, laptop, email, and badge. That doesn’t mean they need to leave the payroll system that day (for instance, to continue benefits into the next month), but they are effectively on leave from that moment forward and no longer have to do work and can focus on finding a new job. While it can sound jarring to say “and today is your last day,” it’s generally better for people because it makes it clear that there is a new normal and it starts right now. It is reasonable to allow employees in certain roles or levels of seniority to continue to have email access for professional or business reasons. Obviously different roles come with different needs. For example, an hourly worker in a call center may not need access to anything from the point of notification forward whereas a business development executive who needs to carefully transition some key relationships to a new owner may justify continued access.

Still, there’s no perfect answer on this one — some employers use a longer notification period to let impacted employees begin their search for a new job while they are still technically employed, essentially trading severance time for notification period time. Figure out what is right for your company based on the kinds of roles being impacted, your culture, the maturity of your workforce, your level of comfort with having impacted employees continue to access systems, and legal requirements.

Dealing with company property while remote: Because of COVID-19, nearly all workers are now working from home, so you will be unable to collect their company property (laptop, badge, etc.). Be clear that this is still company property and will need to be returned at a future date — either a laptop box with a return sticker will be mailed to them or equipment can be dropped off once health-related shelter concerns are lifted. Make it clear if they are or are not allowed to use their laptop after the notification period. For many employees, this may be their only computer and would be useful in conducting their job search. Remember that with every decision you make in this process you need to err on the side of doing whatever you can to help the impacted employee. That likely means letting them still use their laptop.

Depending on your IT security policies, you may even decide to allow employees to keep their laptop as a part of their severance, but do that only after you recognize, understand, and accept the financial, security, and intellectual property risks that approach creates for your company.

Leverage your board, corporate counsel, and comms team

Ensure all layoff decisions and processes are approved by your board of directors with counsel present. Strong corporate governance isn’t just a checkbox item — the feedback and legal protections you gain by getting your plan formally approved is critical for an executive. If done truly terribly, an executive could be opening themself up to various liabilities or even criminal actions. 

Corporate counsel or external employment law is an absolute. You will need to be current and up to speed on many aspects of the law around who and how you lay people off inclusive of:

  • Individuals on visas
  • Adverse impact reporting
  • Cal & Federal WARN
  • Releases in exchange for severance pay 

Throughout this planning process, confidentiality is an absolute must. However, toward the end of your planning process, be sure to include your Communications lead. They can help you draft and prepare all your internal messaging for all employees, as well as create a plan for any possible press inquiries. 

Points to consider as you do layoffs: 

Company communication is critical

You as the CEO need to own the messaging around this. Employees joined your company because of your vision and you. And now you’ve failed them. You now need to be the one to own the communication around layoffs and take responsibility for what is happening. And ultimately, it will be your decisions that determine if employees are treated with dignity and compassion. This is where trust and confidence is built or broken.

As soon as you begin notifications, word of layoffs will travel quickly, so try to complete every conversation as quickly as you can. To the extent possible, avoid one-to-many notifications and try to notify people as privately as possible. It does not matter how many notifications you need to do, it is possible to do them all individually. If your organization is large, you will have to leverage your leadership team to take on some of the notification responsibilities. That means the burden is on you to train and educate your leaders so they do it in a way that exudes deep empathy and consideration for the impacted employees.

As soon as notifications are complete, you will have to communicate to everyone in your company so they hear the news from you directly, and also so they know notifications are indeed complete (so they don’t have to worry about their own status). Assume whatever email you send will be read by those leaving the business and potentially the general public. This should be a previously prepared email explaining the change, the reason for the change, and deep appreciation for the outgoing employees’ contributions. Remember, at this point, you are now communicating with the go-forward team. In coordination with IT or HR you will need to have a way to quickly message everyone not impacted in the workforce reduction. In the email you should also announce an all-hands for later that day (ideally) or early the following day so they know they will have an interactive opportunity to hear from you. Waiting too long to assemble the team is a grave mistake. They need to hear from you, and quickly.

Depending on the size of your organization, you may want to communicate out ahead of the layoff action to all employees. For more on this, listen to this podcast with my colleagues Shannon Schlitz, Operating Partner for our People Practices group, and Alex Rampell, General Partner and repeat founder. For smaller organizations, this is not necessary.

How to actually tell someone they are being laid off

Note: This advice is geared specifically at doing layoffs remotely, but it is applicable to in-person layoffs as well. 

Schedule with common sense: The communication with the impacted employee should include at least two company representatives, ideally one of whom is a trained HR professional, to avoid he-said/she-said scenarios and to maximize legal protections. For more experienced managers, this isn’t strictly necessary.

Be cognizant that sending a calendar invite to an employee for a 20-minute discussion with a manager and HR representative is not very subtle, so don’t do that. You can always create a second invitation for the HR person so they aren’t on the original 1:1 invite. If you are doing these in an office, make sure events are configured to be “private” so the room calendar for the event doesn’t show all the meetings being scheduled in it.

Don’t reuse your Zoom or WebEx “personal meeting room” in back-to-back meetings. Create a new meeting for each conversation.

Be prepared: It’s very smart to have a script prepared in front of you. You can piece it together from what I outline below. It’s not uncommon to get flustered or nervous delivering this news. Rehearse with a colleague if you need to. And trust me, you need to rehearse this. And do not stray far from your script. Do not let a thoughtful and compassionate plan be ruined with bad delivery. 

Logistically, have your camera turned on, make sure recording is turned off, and make sure you are in a quiet place without distractions. Do not be driving, etc. Have your electronic document package with all layoff and severance related information ready to go so you can send it while on the call or right after the call.

Less is more: Since these will be video calls, try to avoid the small talk with whomever is joining the call. You just get to the point, “I have some news to share with you. The leadership team and I have had to make some difficult decisions in order to try and save our business, and as a part of that, we are eliminating your role at the company and you are being laid off.” I’ve always found you literally need to deliver the news twice (in this case, saying “eliminating your role” and “you are being laid off”) because many recipients quickly enter a state of shock or dismay.

And then you just wait. The person needs a moment to process. It might be an awkward silence. That’s okay. They may be upset. They may be embarrassed. They may be angry. It may be all of those things at once. But let them process. Then, they will ask you a question. “What does this mean for me?” “When do I stop being paid?” “How many other people are being affected?” “What if I work harder?” “Can I take a pay cut? I need this job.” 

Here’s how to handle different kinds of questions and reactions:

  • If the question is about logistics (When do I stop being paid? What does this mean?) then you can say, “I know this is a lot. I’m going to be sending you a document that outlines specifically what we will be doing as a part of your exit package and providing additional information and resources for you as soon as we get off the line.” Depending on how many of these you have to do, you can talk them through it, but I would suggest not offering to walk through it as they will have a TON of questions. Make sure your paperwork is in order so you can quickly deliver them all the information, and access to all employee resources you can make available.
  • If the question is on another aspect (How many people impacted? What if I work harder? Can I take a pay cut?) or is just venting, the right thing to do is listen and acknowledge them. It is not appropriate to share whom else was impacted on the team. Stick to the script, repeat the talking points, and listen. This is about them and keep the conversation focused on them. It’s important to remember: You aren’t there to have your mind changed, you aren’t creating wiggle room or giving false hope. You are there to deliver information as clearly, and compassionately, as possible. It may be tempting to say “I didn’t realize how bad this was going to be, let me see what I can do,” but that’s false hope. It’s terrible and weak. It is much better to deliver the news as it is, and privately work behind the scenes if you truly feel an error has been made and needs adjusting.
  • If the employee becomes overly emotional and is having difficulty composing him or herself, it is okay to end the conversation and tell the employee that you recognize it is an incredibly distressing and major change for them. Be clear you will be happy to continue the conversation once they have had time to process.

What can you do? What can you say?

  • Be a leader. This is where true leadership shows up. Demonstrate empathy, be a listener, but stand behind the company’s decisions. If you are doing the notification, but you didn’t make the decision, you cannot say you are “just the messenger.” It’s extraordinarily pathetic to question the decision to lay someone off as you are laying someone off. If you truly felt that way, you should have opted out of being the person doing the notification.
  • You can certainly say “I know this news is hard to hear. It’s normal to have questions that come to you later that aren’t top of mind right now, and I want to make sure you reach out to me or our HR team when you have them. We want to be available to you as you have questions and do whatever we can to help you.”
  • You can say “I know this is a lot to take in, but you will get through this.”
  • If appropriate, make clear that you or someone at the company would be willing to be a reference for them when they are considering job opportunities. 
  • Make it clear what the transition is for them, literally. “As of today, you are no longer required to come to work or do work, and your access to email and company resources will be restricted. If we miss anything, please recognize you have a responsibility to not use company resources. We will be keeping you on payroll until the 1st of the month so that your health benefits continue for the following month” (or whatever your policy is).
    • A note on IT logistics: Often employees will ask “What about all the personal items I have on my system? How can I get this information back?” While they shouldn’t have personal data on their corporate machine, the reality is almost everyone does. I always err on the side of assuming good intentions and allowing people time to get their files in order and back them up somewhere. Since these notifications are all being done over video and employees have their laptops with them, logistically you likely have less control over this anyway.
  • Understand what Employee Assistance Programs (EAP) are available in your benefits plan and make sure you have the documents, links, and resources ready to send to an employee so they can take advantage of all resources that they may not be aware of. This is critical, even if it’s the last bullet point. Your benefits plan, and the government, have many programs available. Know what they are and be organized in how you share them. You must do every possible thing to make this less terrible for the impacted employees.

How do you get an F? 

  • Do not make this about you. Do not, at any time, tell them how hard it is for you to give them this news, or how agonizing it was to make this plan. You still have a job, they do not. This is not hard for you, it’s just unpleasant. Not having a job is hard. I remember once hearing a manager say “this has been such a hard day for me” to people in an all-hands after he conducted a layoff and I will never forget the person who said it and I will never work with that person again. Do not be this person. Even thinking about this person now just makes me angry.
  • Do not make jokes or be lighthearted. This is not a lighthearted thing. Even if you’re nervous and trying to diffuse the tension. Even if it’s an awkward silence. Just be quiet and listen. Leave room for the person to process the news.
  • Do not tell them they will be fine. Do not tell them they will find a better job. You don’t know if that is true. Do not be prescriptive about their future. Do not tell them you will help them find a job (you may offer to try and help, but you cannot guarantee that you will find one for them, and I find the no-guarantee offer to help comes out sounding very weak).
  • Do not blame others. Do not blame the virus. Do not blame the market. Do not blame your board or investors. Just own it.
  • Do not ask them “Are you okay?” because that phrasing is about making you feel better for delivering terrible news and has nothing to do with you supporting the employee. You can definitely ask if there’s something they need.
    • If you are genuinely concerned for their safety and wellbeing, or believe they are a danger to themselves or others, you need to tell them that’s what you are observing, and then you need to be prepared to call someone close to them (and tell them you are doing that) and/or emergency services.
  • You get an F if you fail to have legal or HR review those impacted prior to notification to make sure you aren’t disproportionately impacting specific protected groups (race, gender, etc.). This is especially important in larger organizations where the creation of the list of who is being impacted may have been delegated further down the organization.
  • You get an F if you don’t lay out your day and have a prepared communications plan (built and reviewed with HR and Comms) such that throughout the day you are largely going through a pre-planned, scripted, and practiced set of motions. Without that, you will screw it up, and then you’ll get an F.

There’s no way around this: If you have to do a layoff, it’s going to be one of the worst days of your career. But not as bad as it will be for your impacted employees. You hired someone who worked hard and due to no fault of their own is now losing their job, at a time when losing a job is terrible. However, if you conduct your layoffs with both compassion and clarity, you will ease the burden on your employees and help them begin the process of transitioning to their next role while building trust and confidence with the team that remains.

One last thing based on some comments from draft readers (HR pros, CEOs, etc.): What about pay cuts instead of, or as a part of, layoffs? This sounds good in theory and often helps the team in the short-term feel better about sharing the pain collectively, but it has side effects. First, your financial analysis will tell you it does very little to reduce your cash burn unless your company has hundreds of people (then it is worth considering, as it may save some roles from being impacted). If you only do 10% or 20% pay cuts, it doesn’t materially impact the overall weighted cost of a full-time employee. Second, it risks hurting morale significantly. Within two or three pay periods, when people start to see their reduced paycheck, they will often start to look elsewhere for work (assuming they can, otherwise they’ll just be annoyed). That said, for founders, executives, the broader leadership team, and people making over a certain amount of money, pay cuts can be an important symbolic gesture. In those cases, I’m supportive of pay cuts — but the gesture is generally symbolic — it’s not really saving meaningful amounts of money. For individual contributors, or people paid less than a certain amount, it’s not worth doing pay cuts and I would absolutely exclude those individuals if you plan on enacting any pay reductions.

 

Thank you to Shannon Schiltz for significant feedback and guidance in writing this, and an additional thank you to HR pros Megan Bazan, Anuradha Mayer, and Zoe Smith for reviewing and commenting.



from Hacker News https://ift.tt/2wQ8Tqg