A Mathematical Proof Too Long To Check 189
mikejuk writes "Mathematicians have generally gotten over their unease with computer-assisted proofs. But in the case of a new proof from researchers at the University of Liverpool, we may have crossed a line. The proof is currently contained within a 13 GB file — more space than is required to hold the entirety of Wikipedia. Its size makes it unlikely that humans will be able to check and confirm the proof. The theorem that has been proved is in connection with a long running conjecture of Paul Erdos in 1930. Discrepancy theory is about how possible it is to distribute something evenly. It occurs in lots of different forms and even has a connection with cryptography. In 1993 it was proved that an infinite series cannot have a discrepancy of 1 or less. This proved the theorem for C=1. The recent progress, which pushes C up to 2, was made possible by a clever idea of using a SAT solver — a program that finds values that make an expression true. Things went well up to length 1160, which was proved to have discrepancy 2, but at length 1161 the SAT returned the result that there was no assignment. The negative result generated an unsatisfiability certificate: the proof that a sequence of length 1161 has no subsequence with discrepancy 2 requires over 13 gigabytes of data. As the authors of the paper write: '[it]...is probably one of longest proofs of a non-trivial mathematical result ever produced. ... one may have doubts about to which degree this can be accepted as a proof of a mathematical statement.' Does this matter? Probably not — as long as other programs can check the result and the program itself has to be considered part of the proof."
To long, didn't check. (Score:5, Funny)
Re: (Score:2, Funny)
Re:To long, didn't check. (Score:4, Funny)
Re:To long, didn't check. (Score:5, Informative)
Re:To long, didn't check. (Score:4, Informative)
Re:To long, didn't check. (Score:5, Insightful)
1) Verify the proof that the verification algorithm works.
2) Obtain several independent simple, portable implementations of said verification.
3) Run said implementations on proof certificate on a variety of hardware.
Trust the math, and where it comes to the hardware and software, trust but verify. Too long to check without aid of a computer? Sure, I'll buy that. But you'd have to be an idiot to want to check this proof without a computer. Why is this news? (actually, the result in discrepancy theory is wonderful, and I'm very happy to see it here on Slashdot... but massive computer proofs are truly nothing new)
Re: (Score:3, Informative)
But you'd have to be an idiot to want to check this proof without a computer.
Historically, mathematicians have resisted computer-only proofs. They want eyeballs end-to-end. Your ideas (independent implementations, hardware, etc.) are sound/feasible from a software engineering standpoint, but unsatisfying to mathematicians. (Not being a mathematician myself, I'm ill-suited to testify as to why that's the case, but so it is.)
Maybe one day that mindset will be abandoned, but what's more likely to happen is that mathematics will bifurcate: there will be the set of mathematics that re
Re: (Score:2)
Re: (Score:2)
My understanding is that checking an output of a proof assistant/generator is a trivial matter (i.e., a trained monkey should be able to do it).
That's not much of a standard. A trained monkey could also write the entire works of Shakespeare*.
*: As long as you had enough of them**.
**: Monkeys, not Shakespeares. If you had infinite Shakespeares I guess they could peal a banana, or something.
Re: (Score:2)
Not exactly. Machine-checkable outputs tend to come in one of two varieties: certificates (of which this is an example, since it's an UNSAT certificate) and proofs proper.
Proofs (which are the sort of things you'd feed to Coq or Isabelle) tend to rely heavily on built-in tactics. There are some theories (classical logic, intuitionistic logic, Presburger arithmetic, Tarski arithmetic, etc) which are known to be decidable, but the decision procedures are beyond most humans, let alone trained monkeys. For exam
Re: (Score:2)
I wish modern mathematicians believed the math that they prove day after day for undergrads. If th
Re: (Score:2)
Doesn't seem any worse than a Zero Knowledge Proof system.
Even if we cannot prove it formally, systems like this can put together a system with a very high probability of being correct if we simply test their results. If you can get multiple automated proof systems to claim it is impossible, and you trust the automated proof systems with a moderate degree of certainty, you can trust the results with about the same certainty.
For many problems, having something "statistically proven" is good enough.
Re: (Score:2)
Er.
If the program that checks the proof is considered part of the proof, isn't that one of those recursive situations that Kurt Godel warned us about, as popularised in Hofstadter's GEB?
Re: (Score:2)
Coq and Isabelle are far better choices, being better supported and open source. And if you need their credentials, there is a Coq-checked proof of the four colour theorem, and an Isabelle proof that the L4 kernel is secure.
Re:To long, didn't check. (Score:5, Funny)
The neat part is that, if you take the first bit of each byte of the proof and string them all together, you get a complete HD MPEG copy of The Matrix.
Re:To long, didn't check. (Score:5, Funny)
So you say the real reason why they cannot check the proof is that they would violate the DMCA by doing so?
Re: (Score:2)
To?
I read TL;DC as "too long, don't care". :P
wow (Score:5, Insightful)
less space than wikipedia? that sounds large.
wtf?
Re:wow (Score:5, Funny)
Re: (Score:2)
We're getting to a point where, "Can I store it on a card smaller than my pinky nail?" has replaced "Libraries of Congress."
Re:wow (Score:4, Insightful)
AFAIK, a "standard" LoC is 10TB... around 769 times larger than this file. Comparing this to an LoC is technically valid, but not particularly useful for the typical reader.
Re: (Score:2)
I wonder about the usefulness of a unit of measure that constantly changes. Perhaps we should also consider storing the Library of Congress inside of a temperature controlled, airless chamber. We could then store this unit in the Library of Congress for further refere-
Unhandled exception at 0x00435917 in Howitzer86.exe: | 0xC00000FD: Stack overflow.
Re: (Score:2)
I always measured in station wagons. Maybe that's the American equivalent of a Volkswagen.
Re: (Score:2)
less space than wikipedia? that sounds large.
wtf?
Yea, checking TFA it appears this is a case of less = more.
Re:wow (Score:4, Insightful)
I think they meant to say "less space than that is required to store Wikipedia".
Re: (Score:2, Insightful)
I think they meant to say "less space than that is required to store Wikipedia".
Probably not. Since 0 bytes is less space than that is required to store Wikipedia, I would wager that they actually meant to say, "more space than that is required to store Wikipedia.
Re: (Score:2)
It depends if you stress "that" or not.
1) "less space than *that*, is required to store Wikipedia"
2) "less space than that is required to store Wikipedia" = "less space than what is required to store Wikipedia"
Re: (Score:2)
It depends if you stress "that" or not.
What you say would be true, if the word "that" had appeared in the original summary; however, it didn't. Here's the original wording: "The proof is currently contained within a 13 GB file — less space than is required to hold the entirety of Wikipedia." Nothing to stress or interpret there, although I will grant you that the inappropriate use of the emdash does confuse matters a bit.
I suppose it's all moot now anyway, since the summary has been edited to protect the guilty and to correct the mistak
the beginning, not the end (Score:5, Interesting)
it is the beginning of AI-science, not the end of human science.
Science requires testable, provable, repeatable. If a human cannot understand the proof then he cannot participate in the science. This is likely to be referred to as an "early" version of machine-exclusive science.
Re:the beginning, not the end (Score:5, Insightful)
SAT is not a brute force loop (Score:5, Informative)
Also, unlike what you may think, a SAT proof is not a list of "I tried a=1 and it did not work out, and this is the proof that a=0". A standard SAT proof [wikipedia.org] deduces new clauses from the original problem by applying the resolution rule [wikipedia.org] repeatedly. The newly deduced clauses reduce the search space and, if the problem is unsatisfiable, the solver ends up with the empty clause, which is always FALSE. The proof is a collection of resolution steps that lead to FALSE.
SAT solvers are AI at least since:
SAT is clearly NP complete, and clearly the existence of good SAT solvers is not a proof that P=NP. This means that there will be relatively small problems that SAT solvers won't be able to solve. On the other hand, most real-world problems have a hidden structure which SAT solvers are able to find and use to their advantage.
Re:SAT is not a brute force loop (Score:5, Interesting)
Yeah, I'm familiar with SAT solvers and the fact that they aren't REALLY full brute force; I oversimplified it a bit for the Slashdot crowd. Might have gone a little too far on the "lies to children [wikipedia.org]" scale, mea culpa.
My point was that anyone with high school level math experience can understand the basic problem of boolean satisfiability; I was trying to draw a distinction between problems that are beyond human comprehension and those that are merely beyond human time and ability, with huge SAT instances falling into the latter category. Shouldn't have glossed over the details quite as badly as I did.
Re: (Score:2)
You are right that the basic concepts of SAT solving can be understood by a smart person with high school math experience. But I don't agree that this is as simple as that for the fine details of modern SAT solvers. Some solving steps are non-trivial and look weird some times. Especially the steps that the solver takes to simplify the problem, in the middle of the run.
Today, SAT is not only about binary resolution. Some of the stuff is difficult even for Grad students. There are hundreds of non-trivial pape
It's just a proof strategy... (Score:2)
I was trying to draw a distinction between problems that are beyond human comprehension and those that are merely beyond human time and ability, with huge SAT instances falling into the latter category.
Exactly, the use of SAT unsatisfiability certificates is just a proof strategy... Just like reductions used to show complexity classes, nobody comprehends all of them...
Re:SAT is not a brute force loop (Score:4, Interesting)
SAT is clearly NP complete, and clearly the existence of good SAT solvers is not a proof that P=NP. This means that there will be relatively small problems that SAT solvers won't be able to solve.
Enjoyed your post, but have to correct a small quibble.
From a mathematical standpoint at least, being NP complete doesn't imply that there are some problems that are unsolvable; merely that they won't be solvable in any reasonable amount of computing time. If you have a few hundred billion years of compute time available, a SAT solver might be able to solve even those small problems you mention. Of course, from a practical perspective, none of us are going to be here to get the result in those situations, making them unsolvable from a practical standpoint.
(On the other hand, once the billions of aeons roll by and the machine goes 'ding' and spits out an answer, we do know that we can verify it in poly time. Huzzah!)
While all of this may seem ultra-pedantic, there is enough confusion about NP out there that someone reading your post may get the idea that things that are NP-complete are unsolvable. They're not unsolvable -- we can typically fashion algorithms to solve them, simply that those algorithms run in nondeterministic polynomial time, and thus may have runtimes exceeding the expected lifetime of the solar system, even with every cycle of compute time ever invented pushed at it.
...unless, of course, someone comes up with a proof that P = NP, in which case all those NP-complete problems can be transformed into P problems. Sure, they might still take a few hundred billion years to get a solution, but at least we'd know how many hundreds of billions of years would be needed to get a solution!
Yaz
Re: (Score:2)
I'd hesitate to call one big for loop "AI."
So you would more readily accept a big while loop as AI? ;-)
Re: (Score:2, Informative)
Re: (Score:2)
Agree in principle, but I'm not sure this fails that standard to the extent that it's relevant for science to work. Sure, a human may not directly understand the entire proof. However, like with the Four Color Theorem, they can verify:
- A proof checker would catch errors if there were any, and has failed to.
- The thing it purports to prove is in fact (a representation) of the theorem the submitter claims to have proven.
- The proof generator generates only valid steps.
Could there be errors in the process?
Can't we all just get along? (Score:2)
After 9.5gigs (Score:5, Funny)
In the results there is the following statement.
"As any idiot can plainly see"
Re: (Score:2)
In the results there is the following statement.
"As any idiot can plainly see"
LOL!
no, I didn't rta.
Re:After 9.5gigs (Score:5, Funny)
I have it on good authority that one of the steps of the proof is "???", followed by "PROFIT!".
Re: (Score:2)
Actually it contains the step "then a miracle occurs." [blogspot.com]
Re: (Score:3)
Actually it is worse.
When somebody finally got around to looking at the 9.5GB proof, it started like this:
All work and no play makes HAL a dull program.
All work and no play makes HAL a dull program.
All work and no play makes HAL a dull program.
All work and no play makes HAL a dull program.
All work and no play makes HAL a dull program.
Re: (Score:2)
In the results there is the following statement.
"As any idiot can plainly see"
They wouldn't say that. Now if you wrote "... Obviously we have...." or "clearly it follows" I remember the first time I said to a student - Nope, it isn't obvious. Explain. Deer in headlights look. WHaaaaaa?
Paging Mr Fermat... (Score:5, Funny)
I have discovered a truly marvellous proof of this, which this DVD is too small to contain.
Re: (Score:2)
The only person dumber than a moderator that didn't understand that reference, is the person who comments on moderation after only 19 minutes.
Grad students? (Score:5, Funny)
I thought that's what grad students were for: endless mind-numbing labor. "Here, check this and have it back to me in 30 years or so."
Can't have your pi and eat it too, (Score:2)
Just saying.
Less space than Wikipedia (Score:5, Insightful)
less space than is required to hold the entirety of Wikipedia
I'd venture a guess that this is not unique and that every mathematical proof to date takes less space than Wikipedia. Did they mean more space?
Re: (Score:2)
less space than that is required to hold the entirety of Wikipedia.
Re: (Score:3)
prove that the program works (Score:2)
I don't see why you need to go through the fuss of the 13 GB file. What was the algorithm used to make the file? Prove that the algorithm works. That's your proof. (Run the program a few times, so the probability of errors in the output is close to zero. Remember that the probability of the computer making a mistake (cosmic rays, transistor noise, etc) is smaller than the probability of a human mathematician making a mistake.)
Re: (Score:2)
No. If it's indeed a proof the probability of errors must be 0, not just close to it.
Re: (Score:2)
Re: (Score:3)
No. If it's indeed a proof the probability of errors must be 0, not just close to it.
He's referring to errors during runtime (electrical noise, bit flips, not enough spiders in the case, etc.), not errors in the logic.
If the generator's logic is provably correct, then the things it generates are as well as long as your hardware it working properly. There is no way to rigorously prove hardware works correctly for all input strings, for all time, for all environmental conditions, across all variations due to manufacturing, etc.
Re: (Score:2)
There's not really any such thing as "provably correct logic" to begin with. A some point you just have to decide that the chance of errors across the process is low enough to go on with. I think of this as the "certainty noise floor": it's not important whether the chance of error is 0, but that the chance is really quite small, because that's the best we ever get.
Re: (Score:2)
Re-running the program is equivalent to having more than one mathematician review the proof. In both cases, you're trying to drive the probability of error in verification down to zero.
that word does not mean what you think it means (Score:2)
" Prove that the algorithm works. That's your proof. (Run the program a few times, so the probability of errors in the output is close to zero"
"probably true" is NOT a prove.
Re: (Score:2)
" Prove that the algorithm works. That's your proof. (Run the program a few times, so the probability of errors in the output is close to zero"
"probably true" is NOT a prove.
This isn't a probabilistic 'proof' - it is straight-up deterministic: the SAT result proves it true. Period.
The poster above is alluding to the fact that a random software error could occur that gives the same result erroneously. Thus running the program is used to show that this isn't the case at all.
To assert that a lengthy, complex mathematical proof entirely written by a human is absolutely true requires you to believe the human is incapable of error (Wile's proof of the FLT ran 150 pages and this is n
Re: (Score:2)
Gödel [wikipedia.org] and Turing [wikipedia.org] make strong cases that proving the algorithm works for some inputs that are correct proofs doesn't count as proof it will work for all correct proof inputs. So no, even if you "prove the algorithm works" it is not the same as a rigorous mathematical proof.
Re: (Score:2)
Re: (Score:2)
Forgot to mention those guys showed that such an algorithm that "works" for all valid proofs is not just difficult but mathematically impossible.
No, that's not actually what they proved; it is perfectly possible to prove a given algorithm works for all possible inputs, and even that a proof checker works for all valid proofs. There are certainly things that they proved impossible (e.g. a writing a program that can provide a proof for any true mathematical statement, or that can determine if two arbitrary programs are equivalent), but those don't apply here.
Re:prove that the program works (Score:5, Informative)
Gödel [wikipedia.org] and Turing [wikipedia.org] make strong cases that proving the algorithm works for some inputs that are correct proofs doesn't count as proof it will work for all correct proof inputs. So no, even if you "prove the algorithm works" it is not the same as a rigorous mathematical proof.
You're comparing apples to oranges (and lemons.)
If the algorithm can be proved correct (within whatever axiomatic system you're using) then it's correct. The End.
Gödel's incompleteness theorem shows that certain statements about axiomatic systems can be true but cannot be proved. That doesn't mean you can't be certain of something that is in fact proved (subject of course to the axioms.)
Turing's halting problem is a statement about limitations in the ability of algorithms to examine other algorithms. Again, it doesn't mean you can't prove that an algorithm is correct.
Re: (Score:2)
If the algorithm can be proved correct (within whatever axiomatic system you're using) then it's correct. The End.
Thank you. For the love of FSM, thank you for qualifying your statement about proof.
Re: (Score:2)
That's kind of my point. Given this proof, it would show that the algorithm is incorrect if the proof is shown to be invalid, yet the proof is too long to be verified by anything but another
Re:prove that the program works (Score:4, Insightful)
I think the issue here stems from the concept of "correct" and how knowable that value is.
Um, excuse me. If you're going to quote me and change what I said, then indicate your edits. I have done so above, in bold. Not that I can make sense of them.
That's kind of my point. Given this proof, it would show that the algorithm is incorrect if the proof is shown to be invalid
Wha...? That's just plain wrong. I can think up all kinds of invalid proofs of the Pythagorean Theorem. But showing that a proof is invalid does not mean the theorem is incorrect. It just means your proof is.
yet the proof is too long to be verified by anything but another algorithm, so the halting problem is definitely relevant in a discussion about algorithm-generated proofs which can't be verified by humans.
Again, Turing's halting problem illustrates limitations on the ability of algorithms to decide certain propositions. It does not mean that algorithms can't decide anything. You seem to think that it does.
Yes, some proofs can be generated by algorithms and others can be checked by algorithms, but a mathematician is necessary at some point in the process since no non-trivial generating algorithm can be shown to create only correct proofs and no universal checking algorithm can be created which generates no false positives or negatives.
Your fallacy is that one cannot trust specific algorithms to prove things because no such universal algorithm can be created.
Re: (Score:2)
Sorry, I realized at one point I was editing the quote rather than my own text and thought I fixed it but apparently missed one of my edits and there's no reason it should make sense there. Entirely my fault.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Gödel [wikipedia.org] and Turing [wikipedia.org] make strong cases that proving the algorithm works for some inputs that are correct proofs doesn't count as proof it will work for all correct proof inputs. So no, even if you "prove the algorithm works" it is not the same as a rigorous mathematical proof.
Not true - proving the algorithm works is the same as a rigorous mathematical proof, if you prove mathematically and rigorously that the algorithm works. (The comment about running the algorithm a number of times was simply to guard against the proven-correct algorithm producing a wrong result due to a machine malfunction.)
Re: (Score:3)
Re: (Score:2)
Proof is absolute, within the confines of the accepted axioms.
No, not really. Or perhaps I should say: one can never be absolutely certain that a proof is correct. Practically the flaws in the model (when the model is just math) are so small compared to likely flaws in the modeling that it's best to ignore them, but even in the abstract there is no "absolute proof".
Re: (Score:3)
The problem with this is that "axiomatic system" is an inadequate caveat. You also have to blindly assume that some specific system of deduction works. In practice, specific axioms are usually chosen based on the assumption that induction works, the basic unprovable assumption underlying all of science. But it's worse than that: you can't even prove that deduction works! (It's obvious in hindsight, really.)
Any logical system simply asserts rules of deduction. Why use some particular rules of deduction
Re: (Score:2)
otherwise you'd be arguing with solipsists over every detail, no matter how blindingly "obvious"
Anything new to add?
Re: (Score:2)
So, you're saying that the rules of production are axioms too. Still doesn't change what I said.
Not that it changes your conclusion, but it's an important difference in kind. Axioms are just the assumptions of the model, and you can reject certain axioms without being a hardcore skeptic who doubts logic. The latter assumption - that deduction works - is an assumption often made by solipsists who won't grant the assumption that sense data is accurate, but never think to question their other assumptions. It's great fun to poke at the solipsist position that way!
It's also a non-trivial realization: th
Conclusion is good (Score:2)
SAT solving is easy when there is a solution. When there is not, it gets very hard, as basically the solver enumerates all ways it could have found a proof and shows for each that it did not work. Still faster than a full exhaustive search (which is infeasible from, say 80 bits or so of problem space size). On the other hand, SAT solvers are not that complicated if you ignore implementation details. So the solver itself, together with the 1-bit answer "no" could be used as proof instead of the 13GB. My gues
Need a computer to check the proof (Score:2)
And yes, it's computer proofs all the way down.
Canadian Prime Minister would say... (Score:5, Funny)
"A proof is a proof. What kind of a proof? It's a proof. A proof is a proof. And when you have a good proof, it's because it's proven."
Jean Chretien, former Prime Minister of Canada
Re:In response to the PM (Score:3)
"Yes, I have smoked crack cocaine."
Robert Ford, mayor of Toronto.
Re: (Score:2)
Re: (Score:2)
"A proof is a proof. What kind of a proof? It's a proof. A proof is a proof. And when you have a good proof, it's because it's proven, eh."
Jean Chretien, former Prime Minister of Canada
FYFY
Re: (Score:2)
A proof is a proof, you goof, you goof. And noone can check on a proof, you goof. Unless, you goof, that proof, you goof is the famous Erdos Discrepancy Conjecture for C=2.
Wilbur Post
Ha! Nice retro-meme. [wikipedia.org]
Oh wait nevermind (Score:2)
Turns out its just a memory dump from when a processor bug caused a kernel panic.
Technological Singularity (Score:2)
My Only Questions (Score:2)
My only questions are is it possible to simplify the proof? And how hard would that be?
If we have a testable proof, then it should be possible to throw another algorithm on it to simplify and optimize it...
Only after that step should it be considered ready to inspect and test by others.
I don't understand...? (Score:2)
...why can't they just get their slaves, I mean graduate students, to check its validity by hand?
Please tell me I'm dreaming! (Score:2)
Simple to check (Score:2)
Not the first such proof, by a long shot (Score:2)
There are other theorems with computer-assisted proofs that are too complex to verify by hand, going back decades. The four colour map theorem and the classification of finite simple groups are two examples.
How About Pi? (Score:2)
Okay, so the proof in the article may be too lengthy to accurately check, so why not work with something simpler.
For that, I present the proof that Pi R Square is incorrect.
1: Write down on a piece of paper the commonly used 3 digit shortened approximation of Pi: 3.14
2: Hold that paper up in front of a mirror.
3: Note that in reverse, the characters 3.14 now look like PIE.
4: Pie is typically round, therefore since Pie is round, PI.E is 3.14 backwards, and 3.14 is the common shortened value of Pi, then P
I have discovered a truly marvelous proof ... (Score:2)
I realised this when doing my PhD in 2002. (Score:2)
I was trying to classify the normal subgroups of PSL(n,q) where n,q may be nonstandard elements of a nonstandard model of arithmetic. I pointed out that if Ariah Lev's work formalised correctly, then a few steps would yield the result I wanted, but that this was beyond checking. Once the PhD was done, I did further investigation, and scribbled a thought in a moment of insight, and left it to tidy itself up. I believe I put an entry on either chalisque.wordpress.com or deardiary.chalisque.org, but forget
Re: (Score:2)
Re: (Score:2, Funny)
Editor? This is Slashdot.
Oh, so that's what Beta is for (Score:5, Funny)
Editor? This is Slashdot.
You forgot to finish with the kick into the pit of death.
But what if GP is already using Beta?
Re: (Score:2)
Oh its really quite simple.....once you've learned basic English.
Keep at it. I'm sure you'll get there eventually.
But (Score:2)
Re: (Score:2)
You presume the proof has unique steps at every point. It doesn't, if something couldn't be found in a random sequence of 1161 numbers, then it couldn't be found in an infinite sequence (my apologies for paraphrasing, go read the article). So they used a computer to check the 1161 numbers. So they essentially had a for loop. The code for the for loop was finite. The loop was finite. A few invariants and a bit of Floyd-Hoare logic and whallah, the proof be checked, just not the usual way you'd expect.
Re: (Score:2)
"whallah"? Really?