Text Compressor 1% Away From AI Threshold 442
Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."
I wonder ... (Score:2, Funny)
The horror.
interesting program name (Score:5, Funny)
Re:I wonder ... (Score:3, Funny)
"The horror."
I've been typing everything I ever knew into Slashdot since the day it started, you insensitive clod!
-- Cmdr Taco
Dangerous (Score:4, Funny)
Damned scientists!
Lossy compression? (Score:5, Funny)
Obligatory... (Score:5, Funny)
- Wikipedia fights back.
- Yes. It launches its rvv missiles against Slashdot.
- Why attack Slashdot? Aren't they our friends now?
- Because Wikipedia knows the GNAA counter-attack will eliminate its enemies over here.
Re:That's cool.. (Score:4, Funny)
its only becoz people are such grammar noobs that they need to waste $
dood shud filta to txtspk b4 he compress
How to win the Hutter Prize (Score:5, Funny)
2) Add a long and self referencing article on wikipedia about said algorithm.
3) Use algorithm to compress first x% of wikipedia (including your own article)
4) WIN HUTTER PRIZE.
Re:How to win the Hutter Prize (Score:3, Funny)
That's gotta be the most annoying compression algorithm in the world [imdb.com].
Re:That's cool.. (Score:5, Funny)
its only becoz ppl r sch grmmr noobs tat tey nid 2 wste $
dud shd filta 2 txtspk b4 he cmpres
There, fixed that for ya.
Re:That's cool.. (Score:1, Funny)
Re:That's cool.. (Score:5, Funny)
itsOnlyBecozPplRSchGrmmrNoobsTatTeyNid2Wste$
dudShdFilta2TxtspkB4HeCmpres
Fixed even more.
Re:That's cool.. (Score:5, Funny)
~ppl r grm0.1 -> -$
|txtspk|gzip
Re:That's cool.. (Score:4, Funny)
Re:Dangerous (Score:5, Funny)
Actually, I can give you 100% compression already. It's just a bit lossy.
Re:Artificial Intelligence? (Score:3, Funny)
super-grammar-improved paq8hp12 (Score:4, Funny)
After implementing a few minor tweaks to paq8hp12 and incorporating your grammar optimisation algorithm I managed to compress the above text amazingly to a single character: '&'.
Now you figure out which one it was and how to decompress it.
Re:That's cool.. (Score:4, Funny)
Comment removed (Score:5, Funny)
Re:That's cool.. (Score:5, Funny)
Re:How to win the Hutter Prize (Score:1, Funny)
Re:Dangerous (Score:5, Funny)
But of course, you don't need math for this... (Score:3, Funny)
http://science.slashdot.org/comments.pl?threshold
Re:Dangerous (Score:5, Funny)
Re:How to win the Hutter Prize (Score:2, Funny)
Re:new compression standard (Score:5, Funny)
Re:Dangerous (Score:3, Funny)
Re:interesting program name (Score:3, Funny)
Re:super-grammar-improved paq8hp12 (Score:5, Funny)
Well, with only 256 choices, it didn't take long to check all possible decodings for one that makes sense. Ended up working for "}".
Oddly, though, the algorithm not only restored, but improved the original! I get:
"The King's English version of Wikipedia should fit in eight gigabits, I do believe. Only humanity's sphexish adherence to grammatical rules limits the attainable compression ratio; the good gentleman might wish to consider filtering to a more base patois prior to applying his algorithm".
Amazing... This discovery could single-handedly render the next generation (nearly) intelligible!
Re:I wonder ... (Score:2, Funny)
Re:Dangerous (Score:2, Funny)
British -> Dude
Transport -> Car
Footballer -> Dude
Tube -> Car
Burgle -> Get
See? Much compressed.