from the can-also-use-tea-leaves-if-google-not-available dept.
prostoalex writes "New Scientist talks about Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam and their work to extract meaning of words from Google's index. The pair demonstrates an unsupervised clustering algorithm, which 'distinguish between colours, numbers, different religions and Dutch painters based on the number of hits they return', according to New Scientist."
egrep patterns are full regular expressions; it uses a fast deterministic
algorithm that sometimes needs exponential space.
-- unix manuals