-rw-r--r-- 3964 libsecded-20220828/README raw
Data stored in DRAM sometimes flips bits. A study of Google servers found that each gigabyte of DRAM flips a bit every few hours on average: https://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf Servers typically protect against DRAM errors using "SECDED ECC DRAM", which is slightly more expensive than conventional DRAM. "SECDED" means "single error correction, double error detection". Each 64-bit chunk of data is stored in 72 physical bits; the extra 8 bits are used for checksums that allow any single bit flip to be automatically corrected and any double bit flip to be detected. Operating systems report "correctable errors" ("CE") and "uncorrectable errors" ("UE") to warn system administrators to replace failing DRAM. However, typical desktops, laptops, and smartphones use conventional DRAM, with no protection against bit flips. Usually a bit flip doesn't matter (would you notice a pixel occasionally changing in a movie?), but sometimes it does matter. libsecded is a software library aimed at programmers who would like to reduce the risk of data corruption. The library stores an array as a slightly longer array, using extra bits to automatically correct single errors and detect double errors. The API is straightforward (with lengths represented as long long, not size_t): secded_encode(x,&xlen): encode bytes x[0...xlen-1] as bytes x[0...newxlen-1], changing xlen to newxlen, which is at most xlen+64 secded_decode(x,&xlen): change x[...] and xlen back to what they were, while correcting any single error Notes follow on optional aspects of the API. There's a secded_clean(x,xlen) that you can use to correct errors in encoded data while leaving the data in encoded form. This is essentially the same as secded_decode(x,&xlen) followed by secded_encode(x,&xlen), but simpler and faster. (Periodically sweeping through encoded data to correct errors is typically called "scrubbing" for SECDED ECC DRAM.) Both secded_decode() and secded_clean() return an unsigned int, namely 0 for no errors, 1 through 255 for correctable errors, 256 through 65535 for uncorrectable errors. This lets you measure how often problems occur and decide what action to take. Some values of xlen are never produced by secded_encode(). For example, xlen=120 turns into newxlen=128, xlen=121 turns into newxlen=130, and nothing turns into newxlen=129. In case you're worried about what happens with invalid lengths such as 129: * Invalid lengths are accepted by decode(), essentially as if they were truncated down to the next valid length, so they won't crash your program. * If you're worried about data from an untrustworthy source, you shouldn't think that rejecting invalid lengths is a solution. The source will still be able to produce arbitrary results from decode(). * If you're trying to recognize repeated events, you shouldn't be merely checking for repeated _encodings_ of those events. Even outside the error-correction context, it's usually wrong to assume that only one string can decode to a particular piece of data. * If for some reason you want to recognize invalid lengths, one way is to see that decode() followed by encode() _doesn't_ produce the original length. You can pass a 0 pointer to decode() and encode() to just do length computations without touching the array. You can also more directly recognize the invalid lengths (1, 2, 3, 5, 9, 17, 33, etc.) as nonzero xlen such that (xlen-1)&(xlen-2) == 0. The software is designed to work for array lengths as large as an exabyte (2^60 bytes). However, obviously bugs for large array sizes could have slipped past tests for smaller array sizes, and being able to correct just one error is insufficient for large arrays. Large arrays should be split into smaller arrays with separate error correction. For even higher reliability, SECDED should be replaced with multiple-error correction.
from Hacker News https://ift.tt/Rv3TGFL
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.