Researchers Claim Locally-Testable-Code Breakthrough With Exotic Multi-Dimensional Graph (quantamagazine.org) 62
"A team of researchers has finally created a long-sought locally testable code that can immediately betray whether it's been corrupted..." reports Quanta magazine.
"Many thought local testability would never be achieved in its ideal form." Now, in a preprint released on November 8, the computer scientist Irit Dinur of the Weizmann Institute of Science and four mathematicians, Shai Evra, Ron Livne, Alex Lubotzky and Shahar Mozes, all at the Hebrew University of Jerusalem, have found it. "It's one of the most remarkable phenomena that I know of in mathematics or computer science," said Tom Gur of the University of Warwick. "It's been the holy grail of an entire field."
Their new technique transforms a message into a super-canary, an object that testifies to its health better than any other message yet known. Any corruption of significance that is buried anywhere in its superstructure becomes apparent from simple tests at a few spots. "This is not something that seems plausible," said Madhu Sudan of Harvard University. "This result suddenly says you can do it."
Most prior methods for encoding data relied on randomness in some form. But for local testability, randomness could not help. Instead, the researchers had to devise a highly nonrandom graph structure entirely new to mathematics, which they based their new method on. It is both a theoretical curiosity and a practical advance in making information as resilient as possible....
To get a sense of what their graph looks like, imagine observing it from the inside, standing on a single edge. They construct their graph such that every edge has a fixed number of squares attached. Therefore, from your vantage point you'd feel as if you were looking out from the spine of a booklet. However, from the other three sides of the booklet's pages, you'd see the spines of new booklets branching from them as well. Booklets would keep branching out from each edge ad infinitum. "It's impossible to visualize. That's the whole point," said Lubotzky. "That's why it is so sophisticated...."
[A] test at one node can reveal information about errors from far away nodes. By making use of higher dimensions, the graph is ultimately connected in ways that go beyond what we typically even think of as connections... It establishes a new state of the art for error-correcting codes, and it also marks the first substantial payoff from bringing the mathematics of high-dimensional expanders to bear on codes...
Practical and theoretical applications should soon follow. Different forms of locally testable codes are now being used in decentralized finance, and an optimal version will allow even better decentralized tools.
"Many thought local testability would never be achieved in its ideal form." Now, in a preprint released on November 8, the computer scientist Irit Dinur of the Weizmann Institute of Science and four mathematicians, Shai Evra, Ron Livne, Alex Lubotzky and Shahar Mozes, all at the Hebrew University of Jerusalem, have found it. "It's one of the most remarkable phenomena that I know of in mathematics or computer science," said Tom Gur of the University of Warwick. "It's been the holy grail of an entire field."
Their new technique transforms a message into a super-canary, an object that testifies to its health better than any other message yet known. Any corruption of significance that is buried anywhere in its superstructure becomes apparent from simple tests at a few spots. "This is not something that seems plausible," said Madhu Sudan of Harvard University. "This result suddenly says you can do it."
Most prior methods for encoding data relied on randomness in some form. But for local testability, randomness could not help. Instead, the researchers had to devise a highly nonrandom graph structure entirely new to mathematics, which they based their new method on. It is both a theoretical curiosity and a practical advance in making information as resilient as possible....
To get a sense of what their graph looks like, imagine observing it from the inside, standing on a single edge. They construct their graph such that every edge has a fixed number of squares attached. Therefore, from your vantage point you'd feel as if you were looking out from the spine of a booklet. However, from the other three sides of the booklet's pages, you'd see the spines of new booklets branching from them as well. Booklets would keep branching out from each edge ad infinitum. "It's impossible to visualize. That's the whole point," said Lubotzky. "That's why it is so sophisticated...."
[A] test at one node can reveal information about errors from far away nodes. By making use of higher dimensions, the graph is ultimately connected in ways that go beyond what we typically even think of as connections... It establishes a new state of the art for error-correcting codes, and it also marks the first substantial payoff from bringing the mathematics of high-dimensional expanders to bear on codes...
Practical and theoretical applications should soon follow. Different forms of locally testable codes are now being used in decentralized finance, and an optimal version will allow even better decentralized tools.
WTH? (Score:5, Interesting)
What the hell is it about? The summary talks about local testability, ideal code, canary etc. as if all of us here shared a single understanding of it. Maybe start the summary with framing WTH it's even about? Otherwise it reads like spiritualism
Dazzle us with bullshit (Score:2, Funny)
To get a sense of what their graph looks like, imagine observing it from the inside, standing on a single edge.
(...)
"It's impossible to visualize. That's the whole point," said Lubotzky. "That's why it is so sophisticated...."
Well, there's two minutes of my life I'll never get back. Thanks a lot Slashdot, for letting this garbage through.
Re: (Score:3)
"It's been the holy grail of an entire field."
Which field? Whatever field it is it's so obscure that I've never heard of it, and I took coding theory when I was still at Uni. Comments from the paper like "High rate LTCs could be useful in practice" don't exactly inspire confidence, it means "we have a solution and think we may possibly have managed to think up a problem that it solves".
I suspect the field that it's the holy grail of is "artificial problems for which this conference paper is the solution".
Data integrity checking. (Score:5, Informative)
Their starting point in the article is equivalent to a parity bit, which can be fooled by just changing certain bits within the data to get the same parity so that it still looks authentic. Their breakthrough is an algorithm that can not be fooled by things like this even for very large data sets.
-Just a layman here so I might be completely wrong.
Re: (Score:3)
Re:Data integrity checking. (Score:5, Informative)
In short, someone figured out how to generate an error-correcting code that can be short relative to the input message, but which lets a receiver detect errors with high probability from a small number of bits, randomly chosen from the coded message. One immediate application is that a sender can spread the message across multiple blocks, and the receiver can detect errors on each block independently to request a retransmission of a block with errors, before receiving the whole message. (In contrast, other schemes do this by adding overhead before encoding the message: For example, a sequence number, some indication of message boundaries, a per-message CRC, and a per-block CRC.)
Probably the most important detail -- which is only implicit in the paper, and not explicitly stated in the summary or the linked article -- is that this code is non-systematic [wikipedia.org]: The plaintext message is not embedded directly in the codewords that can be sent. Instead, the codewords are generated through some transformation of the message that creates outputs that are longer than the inputs.
Making a code non-systematic is not nearly enough to make it locally testable, though, and a key part of what this method does is to spread the error detection information very evenly across the output. That makes it feasible to check errors even given randomly selected bits from a coded message.
Re: (Score:2)
And yet, most people will still use crc/md5/sha for checksums, because they are good enough.
Re: (Score:3)
CRC is the only one of those that is remotely similar in use case. This is for a link-layer protocol with significant chance of bit errors and messages considerably larger than the delay-bandwidth product. Not checking for file corruption, or even applications like BitTorrent, but things like mobile wireless links. The base station will have a large bunch of data it wants to send, and if it can tailor the code rate to the link conditions while getting quick feedback on corrupted blocks, the effective thr
Which is why this seems like rubbish (Score:2)
The size of the error correction is optimal for any number of bit errors . Want one bit ECC then your code is a certain size larger than raw. Want two bits? Then larger . Etc....
So can we make a code that can detect any number of errors?
Well imagine I had two messages that were encoded this way each K raw bits and K+C total bits long. Now I induce errors that flip all the bits in one set of K+C bits to make the bit string identical to the other encoded message. Well that is by definition a perfectly fo
This code is not rubbish at all (Score:5, Informative)
Nobody claimed this code can detect all errors. Still, most codes can detect an arbitrary number of errors by making the rate small enough. For example, a 32-bit CRC will detect up to 8 bit errors in the codewords for an 8-bit input. As the input gets bigger, a 32-bit CRC becomes less able to do that. Notably, the relevant corruption model for ECCs is random noise (hard errors, erasures or soft errors) rather than intentional corruption -- cryptography is the domain of security against a threat that "knows" your message structure.
Still, ECCs vary widely in strength. For example, Hamming codes are very simple, but only provide a distance of 3 bits (valid codewords differ by 3 bits). The version usually used in ECC RAM adds a parity bit to make the distance 4 bits, enabling single error correction with double error detection (SECDED). Hamming codes use the fewest possible number of bits to do that, so they are good in a code-rate sense as well as simplicity, but they compromise on code distance and you need essentially the entire message to check for decodability; they are not locally testable.
The paper here claims it is the first code that can provide constant values for all of code rate, distance and checking locality. That does not mean it is a great code for a given application, but it is at a minimum a useful new discovery and way of thinking about how to build ECCs, and its relationship to other good codes (expander codes) suggests that it is good for some applications.
Re: (Score:2)
It's a hand-crafted error correction code that is short relative to the input message, but very long relative to other error correction codes.
Furthermore, as with other types of error correction, if there is no corruption you still had to compare the whole thing. All this does is speed up detection of the corrupted samples, at the cost of a larger code that is hand-crafted, and increased processing requirements for non-corrupt messages.
Re: (Score:2)
... and by "hand-crafted" you mean (from the preprint):
an infinite family of left-right Cayley complexes and base codes that yield locally testable codes.
Re: (Score:2)
So you're gonna lie and cherry pick a phrase that doesn't refute what I said, and ignore all the parts that talk about it having to be hand-crafted?
You thought the words "infinite family" refuted what I said? If that's your level of understanding, why are you arguing with anybody in any way?
Re: (Score:2)
You were entirely wrong. Don't be an asshole about it. You still don't point to anything that says the code is hand-crafted. Your claim is even worse than saying that, for example, CRC-32C and RFC 6330 are hand-crafted, because most of this preprint describes how to construct codes with user-chosen parameters.
Re: (Score:2)
That does make some sense. It also makes it clear that this is bout error detection, not about cryptographic integrity verification. It does not seem to be about error-correction though.
Re: (Score:2)
Section 4.2 of the preprint gives directions on how to perform local error correction.
Re: (Score:2)
You are correct. Obviously, I should have looked at the preprint and not the garbled summary here.
Re: (Score:2)
Re: (Score:2)
Otherwise it reads like spiritualism
You might be somewhat over-stating the benefits of numerism.
Re: (Score:2)
I'm trying to guess what this is about too. My current best guess is "corrupt" doesn't mean "damaged" but instead means "intentionally altered in a way that still passes". Then I could believe a small sample would have to include some bits that were altered. But if that's right, I don't see how a reader could detect it if the checksums all pass, unless they have a correct copy to compare to. So I'm probably wrong.
If they did mean damaged, well, if message has just one bit flipped, if you read the entire
How error correcting codes work, and connections (Score:5, Informative)
Since Shannon and Hamming's work about 60 years ago, there's been a lot of interest in error correcting codes. These are ways of embedding data which allows one to detect or correct errors in transmission. Natural languages do this mostly automatically. For example, if I inclde a typo in this sentence, you can probably notice it and figure out what I wanted. But they aren't perfect at this. If I write "I have a pet bat" it might mean "I have a pet cat" with a typo, but you can't be certain. Error correcting codes let you handle at least some number of errors of a certain type.
But how do you do this? We'll one naive thing is to just repeat each bit in your message a whole bunch of times. Let's say we repeat everything three times. So if you get "ccb" you can be pretty sure that I meant was "c" not "b". To do this in a mathematically precise fashion, we generally talk about this in terms of strings of 0s and 1s, rather than letters, but the same idea applies. (I tried writing out the whole sentence of "I have a pet cat" repeating each word but the stupid ASCII art filter thought I was trying to do ASCII art.)
Notice that in order for this code to work we had to send things which are three times as long as our original message, and we can correct any single typo. That's not great. We can do a bit better. For example, Hamming codes https://en.wikipedia.org/wiki/Hamming_code [wikipedia.org] only increase the message size a teeny bit and still allow us to do a good job correcting a single error like this.
There's some technical subtlety here between "detecting" and "correcting" an error. For example, if you get the sentence "I have a pet dat" you can tell there's a least one error, but correcting it is tough. Should that have read "dog" (two errors), "cat" (one error) or "bat" (one error)- all you can tell is that "dat" is not a type of animal so something has gone wrong here. You can guess what is most likely, but you can't be certain.
One issue is that we have an existence proof of some very good codes. These codes are good in the sense that they don't require messages to be much longer than the message you intend to send, and also they let you detect even large numbers of errors. The argument essentially works by roughly speaking picking an assignment rule at random, and showing that with non-zero probability the code will do a good job. But actually constructing such codes is tough. This work roughly speaking manages to create one of those close to ideal sets of codes that we knew had to exist but had trouble constructing. They did so by using some ideas from graph theory that had previously been used to construct codes but had not succeeded at making really good codes. But these are very good, due to some new insights and careful constructions. One really nice thing about the codes they constructed is that errors can be seen by only checking a small part of the message.
It is worth noting that the work here is using a variant of "expander graphs" and one of the authors of this is Irit Dinur. She previously used expander graphs to prove a version of the PCP theorem https://en.wikipedia.org/wiki/PCP_theorem [wikipedia.org] which essentially says that you can turn any mathematical proof into a proof which can be verified with a high probability by only checking a small section of the proof chosen at random. So there's some definite commonality of theme here.
Re: (Score:2)
I don't seem to be getting mod points recently, but please consider this a +1 Informative !
Re: (Score:2)
Re: (Score:2)
Try meta modding.
Re: (Score:2)
I don't seem to be getting mod points recently, but please consider this a +1 Informative !
Same here.
Deserves score 6 (Score:1)
hey, folks at Slashdot,
go innovative and redesign the score scale!
Why not have it infinite (of course non-linear)?
Re: How error correcting codes work, and connectio (Score:2)
Don't have mod points either, but that's definitely +1 Informative.
Re: How error correcting codes work, and connectio (Score:2)
Posts like this one are why I still read slashdot--thank you :)
Yes but mostly no (Score:1)
As another poster aluded to, no error correcting code can detect an error that transforms one valid message into another valid message.
It is only possible to make the probability of such errors low. Think of the empty barstool joke. There is the possibility of the beautiful woman materializing and it's nonzero.
Re: (Score:2)
What does that have to do with anything?
Re: Yes but mostly no (Score:2)
It has to do with people thinking that magic exists in computing.
Perfect lossless compression of arbitrarily large files down to a single byte.
Forward error correcting codes for free-space optics or microwave links that will "fix in software" not just a bird or insect flying through the beam but also a tree falling on the tower and knocking it down.
Etc.
Re: (Score:2)
No, this method requires a hand-crafted code, there is no such thing as a valid message other than the one it was crafted for.
This story is about creating an error-detection code that is 4x the size of the data, by which you can more rapidly detect large errors.
Of course for single-bit errors, it is better to just compare each bit.
Hate to be "that guy" (Score:1)
...BUT...if the math and mechanics behind it are so dense that your average "layman" can't understand it, then it's functionally useless.
Layman in this case would be the person's who's job it is to ensure said messages are uncorrupted. If that person can't, with confidence, say that the message is uncorrupted then how is this in any way useful to them, and their company at large?
Re: (Score:3)
Re:Hate to be "that guy" (Score:5, Insightful)
...BUT...if the math and mechanics behind it are so dense that your average "layman" can't understand it, then it's functionally useless.
Layman in this case would be the person's who's job it is to ensure said messages are uncorrupted. If that person can't, with confidence, say that the message is uncorrupted then how is this in any way useful to them, and their company at large?
How many "layman" (or even developers for that matter) actually understand the the math and technical mechanics of existing error correction systems that are in use today? Assuming the research checks out, it will likely end up as a set of libraries that software architects and developers use to ensure more solid communications. All the layman really needs to know is that error correction is being used and all (most) developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls. Software architects will need to know how solid the algorithms are (how many and what kind of errors can be detected / corrected), and how much overhead (message size, cpu) is added. Only a handful of folks need to know how it really works.
Re: (Score:2)
all (most) developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls.
You'd be surprised how many developers make mistakes because they incorrectly call those two functions.
Re: (Score:2)
developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls.
Error detection/correction works on a much lower level than this.
No one is doing an EncodeForTransmission("text of this email") ...
Re: (Score:2)
developers need to know is how to make the EncodeForTransmission() and DecodeAndValidate() calls.
Error detection/correction works on a much lower level than this.
No one is doing an EncodeForTransmission("text of this email") ...
I see you have never used PHP.
Re: (Score:2)
I couldn't decide whether to laugh or cry.
Re: (Score:2)
Re: (Score:2)
Anecdote: Back in my days at Boeing, we had a s/w engineer responsible for developing some automated test routines for aircraft acceptance testing. Initially, he wrote some subroutine stubs that did nothing except return the 'PASS' flag. The intent being to test integration into the main program. And then come back and fill in the actual functioning code. And also to get credit for completing his assigned tasks. But he fell behind and never had the opportunity to complete the subroutine definitions. Finally
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
Pretty much this.
You have to trust the entire process that provides this function. The more complex the tool, the fewer people that understand it and the more we have to just trust them.
Now if you'll excuse me, I have to install Windows 11.
Re: (Score:2)
Now if you'll excuse me, I have to install Windows 11.
Again?
Re: (Score:2)
Obviously if you write skeleton code, the result should be "failed".
Or a "throw RuntimeException('unimplemented method')".
Re: (Score:2)
Anecdote: Back in my days at Boeing ....
I was going to say that this could never happen, code reviews etc etc... And then I remembered my days working adjacent to the aircraft industry, and I thought; Yes, this could totally have happened.
They're safer now, so that's good.
Re: (Score:2)
Do you know how your credit card number is validated? Have you ever worked it out by hand? The average layman doesn't, and hasn't. Yet it's still quite useful.
The average layman also doesn't know how a fin field effect transistor works, yet they find them incredibly useful for posting cat pictures on the Internet.
Unless you're a human (Score:2)
Do you thoroughly understand how the computer in your car detects and computes the proper amount of gas to inject?
Do you fully understand how the routers between you and Slashdot decide which queue the packets of your message should be in, and why they should NOT be in the same queue as your Youtube or Netflix packets?
It's unlikely you really understand both, yet you use these things all the time. Certainly you don't understand the CPU you are using to read this message. yet, you're using it.
For non-human
Re: (Score:2)
For non-human animals, an individual animal will hunt down some food, kill it and eat it. That same individual will dig a hole or otherwise make a shelter for themselves. The same individual goes to the creek to get water for themselves. Humans don't do this. That's one major way we're different from the other animals, and why we can do things other animals, like building cars or rockets.
A rabbit only has what that rabbit can get or make for itself.
Ants and other eusocial insects do this sort of thing. Some
Re: (Score:2)
Re: (Score:2)
That's true, a few species have individuals in a few different roles.
The US Department of Labor uses a list of 867 occupations in 98 groups. None of those 867 really fits my job. Meaning there are actually several thousand different jobs for humans.
Buzzword soup (Score:2)
Really, break it down to something that is at least close to fucking English.
What I hope I got correct from this is that an error in one part of a system will create a very noticible error in a piece of information that is making it's way through the entire system.
I send the number "3" to wander through the entire city, it gets beat up and robbed in the ghetto, and it returns as a "0".
Or maybe I'm staring at a panel of lights, all that should be lit. If one or more of the bulbs are dark, the
Re: Buzzword soup (Score:2)
You're very close.
As I understand it, this is an error detection (? correction?) code that lets you check any arbitrary segment of data you like, you don't need a whole block or a whole message.
This might be useful where you're using streams rather than blocked data, or for very slow data.
Uploading new firmware to a satellite or a rover would be a perfect situation where being able to error detect/correct as the stream progresses would be useful as you'd have sufficient otherwise idle time.
Re: (Score:2)
Sounds like the writer was on drugs (Score:2)
The description of other methods is all wrong. Verifying the integrity of a message relies very much not "on randomness". It relies on a checksum and a signature and trust if it is done cryptographically. Also, what has "encoding data" to do with it?
Whoever wrote that "article" did not even understand whether this is error detection or cryptographic integrity checks or something else. That person just seems to have taken words at random from some source and connected them in some way that seemed to fit.
Am I getting this right? (Score:2)
This is AFAICT quite deep into Math brainfuck territory for me, but as I understand it, this is basically, in software terms, a package that can prove its own validity in a way that is unbreakable and works with single chunks of the package even if other chunks aren't available. Am I getting this right?
Long time in coming (Score:2)
Higher dimension maths have been chasing relevance since before the 1970’s. That’s when I benefited from a 13th dimensional mathematician on sabbatical teaching ComSci. Twenty years later re-upping that degree at Columbia, TA’s were on sabbatical trying to factor cryptography.
Here authentication results from 40 yrs. in the trenches! No small accomplishment.