Text Compressor 1% Away From AI Threshold 442
Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."
Artificial Intelligence? (Score:4, Insightful)
Could someone out there please explain how being able to compress text is equivalent to artificial intelligence?
Is this to suggest that the algorithm is able to learn, adapt and change enough to show evidence of intelligence?
AI? I don't think so. (Score:4, Insightful)
And yes, that's absolute bollocks. Shannon's number was just an estimate and only applied to serial transmission of characters, because that's what he was interested in. Since then, a lot of work has been done in statistical natural language processing, and I would be surprised if the number couldn't be lowered.
Anyway, since the program doesn't learn or think to reach this limit, nor gives a explanation how this level of compression is intrinsically linked to the language/knowledge it compresses, it cannot be called AI; e.g., it doesn't know how to skip irrelevant bits of information in the text. That would be intelligence...
Re:Artificial Intelligence? (Score:5, Insightful)
Compression is about recognizing patterns. Once you have a pattern, you can substitute that pattern with a smaller pattern and a lookup table. Pattern recognition is a primary branch of AI, and is something that actual intelligences are currently much better at.
We can generally show this is true by applying the "grad student algorithm" to compression - i.e., lock a grad student in a room for a week and tell him he can't come out until he gets optimum compression on some data (with breaks for pizza and bathroom), and present the resulting compressed data at the end.
So far this beats out compression produced by a compression program because people are exceedingly clever at finding patterns.
Of course, while this is somewhat interesting in text, it's a lot more interesting in images, and more interesting still in video. You can do a lot better with those by actually having some concept of objects - with a model of the world, essentially, than you can without. With text you can cheat - exploiting patterns that come up because of the nature of the language rather than because of the semantics of the situation. In other words, your text compressor can be quite "stupid" in the way it finds patterns and still get a result rivaling a human.
Re:Artificial Intelligence? (Score:3, Insightful)
They argue that predicting which characters are most likely to occur next in a text sequence requires vast real-world knowledge.
The apparent empirical result is that predicting which characters are most likely to occur next in a text sequence requires either
1) vast real-world knowledge
OR
2) vast real-world derived statistical databases and estimation machinery
but there can be a difference in their utility. The point of course, is that humans can do enormously more powerful things with that vast real-world knowledge in addition to symbolic estimation.
The underlying question is whether physical natural intelligence is really just real-world derived statistical databases and estimation machinery. Modern neuroscience says,
"depends on what the meaning of 'is' is, but it's at least halfway there."
However would completing mathematical theorems by searching through Google (statistical pattern matching, which might sort of work for all known theorems on Google) work?
Clearly natural intelligence includes many tasks which can be now well solved with data-oriented sophisticated statistical approaches, perhaps with equal or better performance. Modern algorithms like 'independent components analysis' now can estimate individual sources in audition, "the cocktail party effect" a problem some once thought was a clear sign of true 'intelligence'. Turns out that some sufficiently clever signal processing and nonlinear objective functions can do it---so maybe that's what neurons do too.
The still unsolved question is whether there are some tasks which are clearly 'intelligence' where this class of methods will profoundly fail. Maybe like creating really new mathematics?
AI (Score:2, Insightful)
I see no reason to believe AI and text compression are interchangeable.
I can think of a few methods that would allow a computer to guess a missing word better than humans (exceeding the AI limit), and that such methods would be useless for determining a response to a question, particularly in the real world, where things like punctuation, abbreviation, and capitalization would be highly suspect to begin with.
So I have to say the basis for this competition is flawed, and what's more, the results coming out of it are specific enough to just succeed in this competition, but be completely and utterly useless for any other (real) tasks.
Re:Artificial Intelligence? (Score:4, Insightful)
Re:That's cool.. (Score:5, Insightful)
Now that'd be cool.
Re:ai threshold? (Score:4, Insightful)
Re:Artificial Intelligence? (Score:2, Insightful)
that is an idea or a concept. Interpreting an idea or concept in different ways is meaningful
only by its context.
ex1 the sky is blue => it's beautifull weather (context: you're making a walk)
ex2: the sky is blue => use #0000FF for the sky area (context: graphic work)
If you say, "the weather is beautifull" to an artist he may draw you yellowish-reddish sunset,
which is not the correct interpretation of "the sky is blue" you had in mind" So the context is vital.
I imagine a real AI would evaluate the context and predict what are the next words most likely to be put forward. If it
succeded to translate a concept to an another in a meaningful context "the sky is blue => it's a beautiful weather let's get down the nasa shuttle"
it would no longer be an AI but an I
Not made for mobile devices (Score:2, Insightful)
Seriously, this not inveted for mobile hand held devices. At this moment without compression you could probably store enough text on a mobile phone to keep you constantly reading for a month.
That is the problem (Score:3, Insightful)
Ok, computer programming is not necessarily a lot of maths.
But this article is about something that is really computer science... as opposed to making a CRUD screen in VB.net, which is akin to programming a VCR.
Parsing, compiling, linear programming, sorting, searching, indexing, compressing, walking graphs, drawing graphics, designing circuits, optimizing circuits, these are activities that are computer science and that are all maths.
Edsger Dijkstra once said: "Computers are to computer science what telescopes are to astronomy".
Re:Not made for mobile devices (Score:5, Insightful)
When you look up a word in the dictionary, it takes from 10 to 30 seconds to read the definition. But you did need the whole book/brick to do it.
Re:Dangerous (Score:4, Insightful)
American --> British
transportation --> transport
football player --> footballer
subway --> tyube
burglarize --> burgle
Re:Science != Math (Score:3, Insightful)
"When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science."
--Lord Kelvin
For the sake of completeness, (Score:2, Insightful)
science = math + measurements
That's it. Science is:
1. measure phenomena,
2. figure out the formulas,
3. predict new phenomena,
4. measure new phenomena,
5. if Ok, back to stage 3; if not, back to stage 2.
(ok, ok, 6. (...), 7. Profit!!!, just to appease the masses)
notice stages 1 and 4 are measurements, stages 2 and 3 are maths.
Re:Science != Math (Score:3, Insightful)
Math is a relatively late addition to science. Yes, it's proved very useful. But science happened long before they introduced math.
Well, thinking again, this depends on what you mean by math. Leonardo used math to figure out perspective. Does this means that art depends on math? If so, then science depends on math, and so does walking across the room. And I can see a valid argument to be made along those lines, but that's not what people normally mean. If we look at what people normally mean, then science didn't depend on math until around the time of Kepler. Perhaps you want to call everything earlier engineering rather than science, but engineering depends on math just as heavily as science.
What actually happened was that after algebra was invented, and arabic numerals, it became a lot easier to describe things in math, so people gradually switched away from describing things in ordinary language and to describing them in math. This has had both advantages and disadvantages. Certainly precision has improved. But comprehension by "ordinary folk" has declined, and not entirely because of the arcane subject matter, but also because they needed to learn a new language in order to understand what was being talked about.
OTOH, can you imagine talking about computer programming without using "jargon"?