GZipping Life Forms: Deflate Reveals Bare-Bones 245
An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."
Makes sense... (Score:4, Insightful)
Re:Makes sense... (Score:5, Interesting)
Re:Makes sense... (Score:2)
Re:Makes sense... (Score:2, Insightful)
Anyways as far as this technique is concered this (organic images being more compressable) only holds true for organicly created stromatolite structures vs. chemcialy created stromatolite-like structures.
They've only done 20 images or so, I'd like to know the comparitive compression ratios.
Cool (Score:2)
Since You Ask... (Score:2)
The linked article points out some problems with this approach.
Re:Cool (Score:3, Interesting)
I compress.. (Score:5, Funny)
I'm not sure I should be flattered that the best way to tell a picture of me from a picture of a rock is that I have more redundant image data.
Re:I compress.. (Score:5, Funny)
I am not (Score:5, Funny)
Re:I compress.. (Score:3, Interesting)
Not only are you, but are uniquely Mr Methane, because each individual author has unique and identifying characteristics that can be measured using - guess what - compression algorithms.
Given enough samples, individual authors can be identified and graphs of language relationships [economist.com], too.
I think it's interesting because it raises the bar on preserving anonymity if you publish widely.
Add some entropy to your life; write drunk.
Re:I compress.. (Score:2)
Acid will generate too much chaos.
I can see it now...
Step 1: Pick up pen.
Step 2: Marvel at the wonder of the universe.
Step 3: Huh? What?
Re:I compress.. (Score:2)
scripsit mr. methane:
Comprimar ergo sum?
A-ha! (Score:5, Funny)
So when we compress the ultimate, super-duper intelligent life form we get a two byte file containing "42"
Re:A-ha! (Score:2)
42 (Score:2, Insightful)
42 is one byte.
Re:42 (Score:2)
*double take*
WOW! Lookit that regular pattern! Who'd have guessed? It really IS the ultimate answer. Everything makes sense now.
Re:A-ha! (Score:2)
Hmm, thoughts anyone?
Re:A-ha! (Score:2)
Not particularly ironic (Score:3, Funny)
I'd assume (Score:3, Interesting)
(Mods: the last line was a joke, intended to point out a particularly simple example of a problem - not a troll)
Excellent... (Score:5, Funny)
uhhh.. huh? (Score:2, Interesting)
gzip gates (Score:2, Funny)
Be Humble (Score:5, Funny)
So that tells me that life contains less data then non-life.
Perhaps sophisticated life (human life?) contains even less data than non-sophisticated life. So the smarter we get, the more predictable we get, and the less data we contain.
Perhaps we will someday get smart enough to be totally compressed to one bit. In the time I thought about this concept, I think my gzip file got even more compressed. Hmm....
Re:Be Humble (Score:5, Insightful)
No, it means that life contain less noise than non-life.
Re:Be Humble (Score:2)
Thank you. You just completely cleared this up for me. I was sitting, puzzling, thinking, "Life contains huge amounts of information, and should therefore not compress as well as non-life." Then I realized information is less entropy, so it actually will compress better. (What a relief. I was almost on the verge of taking this personally. ;)
A useful analogy: an ASCII file, containing information, compresses better than a file of truly random binary data. The truly random (think cryptographic) data h
Re:Be Humble (Score:3)
Re:Be Humble (Score:2)
Apparently you don't have children.
Information theory (Score:2)
Random data may not be meaningful but they are full of information by definition.
Consider the sequence 123123123123. The sequence is highly ordered, and therefore is probably meaningful (at least to someone, somewhere), but it contains very little information. In contrast the random sequence 196390244187 is highly disordered, total
I told you so! (Score:3, Funny)
I was wondering... (Score:2, Funny)
bzip2? (Score:3, Interesting)
After all, they have quite different compression characteristics (on one hand, compression of a megabyte of zeroes is much better in bzip2, OTOH adding the same file on top of itself and then compressing gives much less additional compressed size with gzip than with bzip2 - tested with
Re:bzip2? (Score:2)
- Sam
The fractal geometry of nature? (Score:5, Interesting)
Then again, what do I know? Maybe something more immersed in this field can tell us whether there's a seed of truth to my ramblings
Greetings
--> R
Re:The fractal geometry of nature? (Score:2)
GZip doesn't do fractal compression. It will compress repeating patterns though. (My two arms will be compressed because they are similar, not because that look like little humans.)
I don't think there are many fractal structures in nature. Rocks are different than sand. Humans are different than cells. A field is different than grass, which is different than cells, which is different than molecules.
Joe
Re:The fractal geometry of nature? (Score:3, Insightful)
For starters, how about the branching structure of the airways in your lungs?
Jeff
Re:The fractal geometry of nature? (Score:2)
For starters, how about the branching structure of the airways in your lungs?
Or the branching of blood vessels [fractal.org]. Or bone microstructure [washington.edu]. Or nerve cells [fractal.org].
Re:The fractal geometry of nature? (Score:2)
torso breaks into limbs. limbs break into digits.. looks like a crude fractal to me.
Fractal = better compression? (Score:2)
I just find it strange that I keep reading comments nodding at the assumption that being fractalish mea
Re:Fractal = better compression? (Score:2)
1. GZip doesn't do fractal compression.
2. There just aren't that many visibly fractal structures in life. There are some structures that are obviously fractals life (as mentioned in some other posts), but that is also true in rocks.
Joe
Thought this would be somewhat obvious... (Score:2, Insightful)
Every one of us is incredibly redundant, and I don't just mean in our posts on slashdot!
Simply consider that you can have a reasonably good duplicate of yourself, with only the DNA contained in a single cell!
You may need most of your parts to be functional but, information-wise, it all comes down to 1 germ cell (say, a spermatozoid) and the aparatus needed to move it into proximity of another compatible germ cell ;)
Re:Thought this would be somewhat obvious... (Score:2, Interesting)
Your DNA is only sufficient to create another state machine with the same rules you had at birth.
It will not re-create your complexity because our dna-state machines are designed to create brains which are 'genetically-memoryless', capable of self modification, and have incredible data collection and storage capacity.
Think of your DNA as the graphics engine for Quake. It is relatively small (space-wise) compared to the textures and levels. Add different data, and you have still have a first-person
this might have a few glitches (Score:5, Funny)
This post can't be compressed.
Re:this might have a few glitches (Score:2)
This post can't be #1.
Re:this might have a few glitches (Score:2)
Re:this might have a few glitches (Score:2)
The Mars fossil IS made by life; my wife is not. (Score:5, Funny)
at the comparison page [astrobio.net] attached to the article that lets you run the same test on images that the researchers tried. In a startling discovery that is sure to earn me a Nobel Prize for Physics, Chemistry, Biology and Marital Relations, I was told the following:
"Answer: Image 1 [the Mars image](1.43702451394759 % compression) has a higher complexity measure than image 2[the image of my wife] (0.773501341151519 % compression), and thus image 1 is more probably biogenic."
Not only does this prove that there was once life on Mars, but it also proves that my wife is some sort of robot. Further research will be undertaken pending receipt of my prize money.
Re:The Mars fossil IS made by life; my wife is not (Score:5, Funny)
The problem here is that your wife is wearing clothes. Clothes are man made.
If you send me a picture of your unclothed wife, I'll be happy to, uhm, test this theory.
Ferengi (Score:2)
Here you go (Score:2)
Chicken Before the Egg (Score:2)
Re:The Mars fossil IS made by life; my wife is not (Score:4, Interesting)
Re:The Mars fossil IS made by life; my wife is not (Score:3, Funny)
Re:The Mars fossil IS made by life; my wife is not (Score:2)
The.. (Score:2, Funny)
Mad Scientist: "Fire up the GZip Continueum Transfunctioner!"
Operator: "Okay, Boss"
*Bizzzttt*
Information vs. Meaning (Score:2, Interesting)
Kolmogorov Complexity (Score:5, Interesting)
Roughly, Kolmogorov Complexity is a measure of randomness - the measure is how long a computer program needs to be to reproduce data (pardon an oversimplification).
-Mark
Filtering Images (Score:2)
It was pretty simple... Images over a certain size contained lightning, the others were mostly black, therefore smaller. Once I filtered it that way, manually filtering out the better images was easy.
Operating Principle? Kolmogorov Complexity (Score:3, Informative)
Biological clocks in unicorns... (Score:5, Interesting)
This is the loopiest thing I've heard of since Rosenblatt reported that his Perceptrons could distinguish between music composed by Bach and music composed in imitation of Bach.
Good heavens, any picture that's slightly out of focus will now be declared to be evidence of "biological processes."
I'm guessing that the researchers are not as nutty as they sound and that they've done more than is being reported, but still...
Reminds me of the researchers in the sixties who were publishing analyses of data that supposedly showed "biological clocks." It turned out that they were using smoothing algorithms that, basically, were filters that had a 24-hour peak in the frequency domain--so their analysis was creating the patterns they claimed to be detecting. A debunking article was published in Science in which another research used data from a random number table (the "unicorn" data) and showed that the same analysis techniques showed that the unicorn had a biological clock.
Re:Biological clocks in unicorns... (Score:3, Insightful)
So, what do they verify the gzip method
Re:Biological clocks in unicorns... (Score:2)
gzip that and see if you get a positive.
then look at porous rock where the little circles are from air bubbles or somesuch and see if you get negative results.
Featurelessness (Score:2)
gzip isn't perfect, but it will find repetitive byte sequences of any kind, regardless of t
lameness filter? (Score:2)
gzip - the swiss army knife utility (Score:5, Funny)
Re:gzip - the swiss army knife utility (Score:2)
I get the feeling... (Score:2)
Re:I get the feeling... (Score:2)
And thinking about that, I'd like to say: Editors, please do not overact tomorrow! Last year was funny for the first two 4-1 stories, but after that it just got annoying. Please limit yourself to maybe two or three April Fool stories, but make 'em good instead...
Thank you.
Slightly Dodgy (Score:5, Interesting)
The big problem is the use of JPEG source images. Unless you've stuck it up to the maximum size on quality, then the jpeg artifacting (which is in effect repeating blocks of image data after transitions) will probably mask any hidden level of complexity in the images - the human brain is a much better tool at pattern recognition than most computer algorithms (especially those algorithms not designed for the task!).
Throw high-resolution bitmap files at it, and I'd be more persuaded that there is a genuine effect. Until then, I suspect it's more of a happy coincidence that the files they've thrown at it give results they are excited about.
Jolyon
Re:Slightly Dodgy (Score:3, Interesting)
They used TIFFs not JPEGs (Score:2)
gzip == measure of information content (Score:2, Informative)
<p>I'm not sure if above is public knowledge, but I have used it as a one additional feature for certain pattern recognition tasks for a while.</p>
Compression to measure semantic content (Score:3, Interesting)
Pattern Recognition (Score:3, Interesting)
Each algorithm could be fine tuned for a paticular type of pattern.
Is that an elephant or a giraffe?
Does it compress better with the elephant algorithm or the giraffe algorithm?
Seperate the chaff (Score:2, Interesting)
That having been said, it sounds good
viruses? (Score:2, Interesting)
Just a thought.
New Meaning (Score:2)
"Honey do I look fat in this?" Put on Gzip glasses. "Of course not dear."
Pretty sloppy, you mean... (Score:3, Insightful)
this can also detect PHB's (Score:4, Funny)
feed a business technology proposal through gzip
Re:this can also detect PHB's (Score:2)
goatse.jpg and tubgirl.jpg? ;) (Score:2)
But seriously, I wonder what weird pics people have uploaded :)
Wow (Score:2)
Does SETI@home use this approach? (Score:2)
Makes sense (Score:2)
Someone pointed out that using JPEGs as source images is tainting the results. That's
Did something like this years ago (Score:2, Informative)
Using CGI as the user hit the web page it took pictures at different shutter speeds. Working up from the slowest shutter speed the first JPG over 20K bytes was the right exposure and was shown on the page.
Gzip doesn't preserve well... (Score:3, Funny)
The problem with gzip is that doesn't preserve data very well. Now tar, it preserves fossil data quite well.
This method... (Score:2)
Now, I know I will either be flamed or derided for bringing up the mention of this text. I don't claim to be an expert on it (in fact, the scope and breadth of the reading convinced me that one time through is no where near enough - I will probably re-read it several more times
Just remember to ftp gzip files in binary mode (Score:2)
Windows XP is alive! (Score:2, Funny)
Redundancy == Survival (Score:2)
What drives nature? Survival for one. Survival often depends on having a backup so it's no surprise that nature tends to adapt redundant systems.
I have 2 hands in which I can hold a spear or club. I have 2 eyes to identify my potential predators. I have 2 ears to hear them coming. I have two...well you get the picture.
All of this is externally visible. As such, the a jpeg or gi
Re:The same image... (Score:3, Informative)
BTW, if you want to be file name independent, you can use This way, gzip doesn't see the file name, and therefore doesn't include it into the
Re:why no bzip2 ? (Score:5, Interesting)
gzip might be preferable because it works more locally. It only keeps track of the last n bytes of data and does substitutions based on patterns seen in those n bytes.
bzip2 uses a markov predictor and the chain length is typically much longer than gzip uses, so the compression is less local. That's great if you're going for compression but for this work, it might be misleading.
That said, gzip doesn't know about image formats, so I wonder if these guys are getting some false positives on scanline wraps and other non-image data.
hidden markov models (Score:3, Interesting)
Maybe if you could have an image recognition system do the Hard Machine Vision probelm of generating a schematic of the picture, and then fed the "leg bone is connected to the hip bone" kinda data into a HMM you could work out which fossils are ancient Cambrian crustations and which ones are Trogdor the Burninator.
Bzip2? Bah , new fangled rubbish! (Score:3, Funny)
and were glad of it and some of the old timers could have been confused with non living processes
even without the help of gzip anyway!
Re:and language detection. (Score:3, Informative)
Re:horsefeathers. (Score:2)
I don't think a picture of a human compresses better than a tree though.
Pr0n Model? Ha! (Score:2)
Try it with a pic of an amateur.:)
Re:Simphile Seems to do something similar (Score:2)
Re:What happens if they use.. (Score:2)