Forgot your password?
typodupeerror
Science Technology

GZipping Life Forms: Deflate Reveals Bare-Bones 245

Posted by Hemos
from the getting-to-the-core-of-the-matter dept.
An anonymous reader writes "To distinguish images derived from living vs. non-living sources, USC and NASA JPL researchers report today using the standard gzip compression utility. As a measure of overall pattern complexity, they find that the inherent pixel content of biologically generated fossils produces higher image compression ratios [more data redundancy], compared to their non-biological counterparts. The more the file shrinks, the more likely it is that a living process was involved. A test is live online here. This extends the simple, but powerful, uses of gzip to biogenic fossil detectors, in addition to spam cop filters, DNA sequence comparisons, digital camera image crunchers, etc. In nine months, the two Mars rovers will send back the first microscopic-scale images of Mars rocks, which should be amenable to some of these same techniques: thus gzipping is apparently pretty zippy."
This discussion has been archived. No new comments can be posted.

GZipping Life Forms: Deflate Reveals Bare-Bones

Comments Filter:
  • Makes sense... (Score:4, Insightful)

    by Anonymous Coward on Monday March 31, 2003 @10:33AM (#5631029)
    Lifeforms seem to be built on patterns afterall. Patterns are easily compressible.
    • Re:Makes sense... (Score:5, Interesting)

      by jolyonr (560227) on Monday March 31, 2003 @11:27AM (#5631306) Homepage
      Unfortunately it's not that simple, inorganic systems can have as much visual complexity as organic things. For example.. um.. (looks out of window here in Toronto).. a snowflake! Fractal complexity, such as that seen in the branches of a tree, is frequently mirrored in the inorganic world - the snowflake is one example, another less well known example are manganese dendrites, they look just like fossil plants, but are totally inorganic such as these [vic.gov.au] [Victoria Museum]. The patterns of frost on a frozen windscreen are another example. I can't see how a computer program can distinguish whether such complex patterns are signs of life or not. Still, if it helps NASA get more funding, then who am I to argue! Jolyon
      • It's never that simple - but this method appears to be an easy way to separate the wheat from the chaff. It's not a life-detector, it's more of an indicator that life may be present, pointing out higher-potential samples for further review. False positives are OK - they'll be discovered when a person examines the sample. As long as this weeds out some of the non-live samples, it's of benefit.

    • Re:Makes sense... (Score:2, Insightful)

      by Ted_Green (205549)
      Of course, so do a lot of crystalized structures. Lots of things are built on patterns.

      Anyways as far as this technique is concered this (organic images being more compressable) only holds true for organicly created stromatolite structures vs. chemcialy created stromatolite-like structures.

      They've only done 20 images or so, I'd like to know the comparitive compression ratios.
  • Bad pun at the end of the original post not withstanding, this is pretty cool stuff. Wonder why nobody thought of using comression in this manner before? This has all sorts of potential uses.
    • Wonder why nobody thought of using comression in this manner before? This has all sorts of potential uses.
      Actually, there is a precedent [mcsweeneys.net] for using compression on organics.

      The linked article points out some problems with this approach.
    • Re:Cool (Score:3, Interesting)

      by tijnbraun (226978)
      A similiar technique has been used by italian mathematicians to differentiate pages from various authors by using zip. A nature article can be found here [nature.com]. After a request from a dutch newspaper they were able to identify one author (Marek van der Jagt, which made his first debut) to be the same as an already well-known author (Arnon Grunberg).
  • by mr. methane (593577) on Monday March 31, 2003 @10:35AM (#5631038) Journal
    ... therefore I am.

    I'm not sure I should be flattered that the best way to tell a picture of me from a picture of a rock is that I have more redundant image data. :-)
  • A-ha! (Score:5, Funny)

    by grub (11606) <slashdot@grub.net> on Monday March 31, 2003 @10:36AM (#5631045) Homepage Journal

    So when we compress the ultimate, super-duper intelligent life form we get a two byte file containing "42"
    • Well, indeed; in terms of applying image compression, the highest form of life is indeed the super-intelligent shade of the colour blue, just as Adams predicted.
    • 42 (Score:2, Insightful)

      by snarkh (118018)

      42 is one byte.
    • Excellent! Now that we've made that discovery, all we need to figure out is how to decompress it.

      Hmm, thoughts anyone?
  • I'd assume (Score:3, Interesting)

    by Omkar (618823) on Monday March 31, 2003 @10:37AM (#5631049) Homepage Journal
    that this has something to due with patterns and image continuity. If so (enlighten me!), then it would be a decent filtering tool, but reliability would be a major problem. Geological (or whatever) patterns could fool the algorithm. Finally, the most compressible image consists of monochrome - is it alive?

    (Mods: the last line was a joke, intended to point out a particularly simple example of a problem - not a troll)
  • by Anonymous Coward on Monday March 31, 2003 @10:37AM (#5631058)
    No more sniffing when i'm checking items in the refrigerator - is it 'alive' ? gzip is the answer!
  • uhhh.. huh? (Score:2, Interesting)

    by SamBeckett (96685)
    Doesn't gzip only look for patterns in one dimension? Assuming they are using these for pictures, they are missing the boat on at least one more area of complexity!
  • then we will find out if he truly is the borg!

  • Be Humble (Score:5, Funny)

    by hugesmile (587771) on Monday March 31, 2003 @10:39AM (#5631067)
    OK, so if I have this right: Life is less random, and more predictible (more compressable)than non-life.

    So that tells me that life contains less data then non-life.

    Perhaps sophisticated life (human life?) contains even less data than non-sophisticated life. So the smarter we get, the more predictable we get, and the less data we contain.

    Perhaps we will someday get smart enough to be totally compressed to one bit. In the time I thought about this concept, I think my gzip file got even more compressed. Hmm....
    • Re:Be Humble (Score:5, Insightful)

      by javatips (66293) on Monday March 31, 2003 @10:50AM (#5631134) Homepage
      > So that tells me that life contains less data then non-life.

      No, it means that life contain less noise than non-life.

      • Thank you. You just completely cleared this up for me. I was sitting, puzzling, thinking, "Life contains huge amounts of information, and should therefore not compress as well as non-life." Then I realized information is less entropy, so it actually will compress better. (What a relief. I was almost on the verge of taking this personally. ;)

        A useful analogy: an ASCII file, containing information, compresses better than a file of truly random binary data. The truly random (think cryptographic) data h

        • No, you still have it wrong. Information is entropy. More information is more entropy. However, imagine the amount of information in a JPEG of your face, compared with a JPEG of bits from /dev/random. The latter will have more information and thus more entropy. That shouldn't give you an inferiority complex. :-)
      • > No, it means that life contain less noise than non-life.

        Apparently you don't have children.
      • Information theory says that random noise is the most "concentrated" form of information possible. Roughly, information == entropy (actually, entropy times number of symbols).

        Random data may not be meaningful but they are full of information by definition.

        Consider the sequence 123123123123. The sequence is highly ordered, and therefore is probably meaningful (at least to someone, somewhere), but it contains very little information. In contrast the random sequence 196390244187 is highly disordered, total

  • by twoslice (457793) on Monday March 31, 2003 @10:42AM (#5631080)
    The Magic School Bus is true!
  • ... if it could find life forms in my doom wad's?
  • bzip2? (Score:3, Interesting)

    by maxwell demon (590494) on Monday March 31, 2003 @10:43AM (#5631093) Journal
    Has anyone checked if bzip2 is better or worse in detecting biological products?

    After all, they have quite different compression characteristics (on one hand, compression of a megabyte of zeroes is much better in bzip2, OTOH adding the same file on top of itself and then compressing gives much less additional compressed size with gzip than with bzip2 - tested with /usr/src/linux/kernel/sys.c, 24957 bytes uncompressed).
    • by Kiwi (5214)
      I'm noticed that bzip2 compresses better (and sometimes, much better) for most tarballs of software and other data. However, in the case of a list of prime numbers, gzip actually compresses better than bzip2.

      - Sam
  • by RNG (35225) on Monday March 31, 2003 @10:43AM (#5631095) Homepage
    Although I'm certainly no compression expert, I think this makes sense. Many (most?) natural systems have fractal structures on some level so it only makes sense for them to compress better (ie: have more self-similar features) than systems which don't have this feature.

    Then again, what do I know? Maybe something more immersed in this field can tell us whether there's a seed of truth to my ramblings ...

    Greetings
    --> R
    • Could you give some examples of fractal structures in a human?

      GZip doesn't do fractal compression. It will compress repeating patterns though. (My two arms will be compressed because they are similar, not because that look like little humans.)

      I don't think there are many fractal structures in nature. Rocks are different than sand. Humans are different than cells. A field is different than grass, which is different than cells, which is different than molecules.

      Joe
    • I don't understand . . . Does an image having a fractal structure really compress better than one without? I can see that it might compress really well if you could detect the underlying algorithm: "Hey, that's region X of the Mandelbrot set", so its Kolmogorov complexity would be pretty low. But does gzip really detect this? As an image, that bit of the Mandelbrot set might be pretty hard to compress.

      I just find it strange that I keep reading comments nodding at the assumption that being fractalish mea
      • Right. I agree with you.

        1. GZip doesn't do fractal compression.

        2. There just aren't that many visibly fractal structures in life. There are some structures that are obviously fractals life (as mentioned in some other posts), but that is also true in rocks.

        Joe
  • Every one of us is incredibly redundant, and I don't just mean in our posts on slashdot!

    Simply consider that you can have a reasonably good duplicate of yourself, with only the DNA contained in a single cell!

    You may need most of your parts to be functional but, information-wise, it all comes down to 1 germ cell (say, a spermatozoid) and the aparatus needed to move it into proximity of another compatible germ cell ;)

    • Your DNA is only sufficient to create another state machine with the same rules you had at birth.

      It will not re-create your complexity because our dna-state machines are designed to create brains which are 'genetically-memoryless', capable of self modification, and have incredible data collection and storage capacity.

      Think of your DNA as the graphics engine for Quake. It is relatively small (space-wise) compared to the textures and levels. Add different data, and you have still have a first-person

  • by jj_johny (626460) on Monday March 31, 2003 @10:45AM (#5631114)
    When I compressed the transcript of the Osbornes, it got increadibily high compression but I don't think they are intelligent life forms. Or maybe I am really wrong.

    This post can't be compressed.

  • In a true first for extraterrestrial biotic research, I decided to compare two pictures:

    at the comparison page [astrobio.net] attached to the article that lets you run the same test on images that the researchers tried. In a startling discovery that is sure to earn me a Nobel Prize for Physics, Chemistry, Biology and Marital Relations, I was told the following:

    "Answer: Image 1 [the Mars image](1.43702451394759 % compression) has a higher complexity measure than image 2[the image of my wife] (0.773501341151519 % compression), and thus image 1 is more probably biogenic."

    Not only does this prove that there was once life on Mars, but it also proves that my wife is some sort of robot. Further research will be undertaken pending receipt of my prize money.

  • The.. (Score:2, Funny)

    by saqmaster (522261)
    .. thought of being gzipped is quite disturbing.

    Mad Scientist: "Fire up the GZip Continueum Transfunctioner!"
    Operator: "Okay, Boss"

    *Bizzzttt*
  • by 16977 (525687)
    One of the posters brings up an interesting point. Although meaningful data has more information than pure noise, it also has less than a blank signal. When you download pictures, regardless of the "meaning" they have to you, their compression can vary a considerable amount. And you've probably heard the statistic that the english language is 50 percent redundant. That figure may vary a bit too, but the point is that english's meaning to us is independent of its information content. And the probability
  • by MarkWatson (189759) on Monday March 31, 2003 @10:53AM (#5631158) Homepage
    This seems like a "sort of" restatement of Kolmogorov Complexity.

    Roughly, Kolmogorov Complexity is a measure of randomness - the measure is how long a computer program needs to be to reproduce data (pardon an oversimplification).

    -Mark

  • While slightly different, this reminds me of the way I filtered a bunch of images from a video camera. I was taking many frames per second of a thunderstorm and I wanted to find which frames out of thousands contained lightning strikes.

    It was pretty simple... Images over a certain size contained lightning, the others were mostly black, therefore smaller. Once I filtered it that way, manually filtering out the better images was easy.

  • by fygment (444210) on Monday March 31, 2003 @10:57AM (#5631179)
    Read about it in _the_ book (http://www.cwi.nl/~paulv/kolmogorov.html) or check out the web site here (http://www.hutter1.de/kolmo.htm). For a more succint idea of the approach, these articles by one of the gurus on the topic (http://www.cs.ucsb.edu/~mli/focs.ps and http://www.cwi.nl/~paulv/papers/ecml97.ps).
  • by dpbsmith (263124) on Monday March 31, 2003 @10:59AM (#5631186) Homepage
    zip is a fine thing, but it's not a pattern-recognition program!

    This is the loopiest thing I've heard of since Rosenblatt reported that his Perceptrons could distinguish between music composed by Bach and music composed in imitation of Bach.

    Good heavens, any picture that's slightly out of focus will now be declared to be evidence of "biological processes."

    I'm guessing that the researchers are not as nutty as they sound and that they've done more than is being reported, but still...

    Reminds me of the researchers in the sixties who were publishing analyses of data that supposedly showed "biological clocks." It turned out that they were using smoothing algorithms that, basically, were filters that had a 24-hour peak in the frequency domain--so their analysis was creating the patterns they claimed to be detecting. A debunking article was published in Science in which another research used data from a random number table (the "unicorn" data) and showed that the same analysis techniques showed that the unicorn had a biological clock.

    • Similar thoughts here. From the article:

      So how does one separate the wheat from the chaff, the true stromatolites from the fakes?

      One method is to examine the suspect rock with a microscope, looking for visual evidence of microorganisms. But as researchers who study ancient terrestrial rocks- and one notorious Martian meteorite - have discovered, it isn't all that easy to tell, just by looking at shapes, whether or not a microscopic blob in a rock was once alive.

      So, what do they verify the gzip method

      • IANAXB (xenobiologist) but you could try looking at terrestrial bacteria, freshly killed, which leave behind calcium deposits.

        gzip that and see if you get a positive.

        then look at porous rock where the little circles are from air bubbles or somesuch and see if you get negative results.
    • In some ways this technique is meant to defeat systematic biases like the ones you mention. Compression tools make few assumptions about the data they process, so they serve as a check against more tailored filters which may introduce artifacts, or be defeated in some way. This problem may occur because they look for pre-selected "features" in the data rather than looking at the distribution of the data as a whole.

      gzip isn't perfect, but it will find repetitive byte sequences of any kind, regardless of t
  • Isn't that conclusion the opposite of CmdrTaco's use of compression to weed out "lame" postings? More noise is apparently more valuable discussion, while less noise is somehow considered likely spam? How many good postings have you seen with a line "this has been added to get past the lameness filter"?
  • by kinnell (607819) on Monday March 31, 2003 @11:07AM (#5631227)
    I myself have successfully used gzip for factoring large prime numbers, sorting the men from the boys, unblocking the kitchen sink and cracking safes. I'm currently trying to locate Osama Bin Laden by compressing Al Jazeera footage, but all I come up with are reports of Elvis sightings.
  • ...that this item was posted a day early.
    • Hmm yes, I thought so for a second too...

      And thinking about that, I'd like to say: Editors, please do not overact tomorrow! Last year was funny for the first two 4-1 stories, but after that it just got annoying. Please limit yourself to maybe two or three April Fool stories, but make 'em good instead...

      Thank you.
  • Slightly Dodgy (Score:5, Interesting)

    by jolyonr (560227) on Monday March 31, 2003 @11:10AM (#5631238) Homepage
    This whole thing is slightly dodgy, and I begin to wonder whether it was released a day early by mistake.

    The big problem is the use of JPEG source images. Unless you've stuck it up to the maximum size on quality, then the jpeg artifacting (which is in effect repeating blocks of image data after transitions) will probably mask any hidden level of complexity in the images - the human brain is a much better tool at pattern recognition than most computer algorithms (especially those algorithms not designed for the task!).

    Throw high-resolution bitmap files at it, and I'd be more persuaded that there is a genuine effect. Until then, I suspect it's more of a happy coincidence that the files they've thrown at it give results they are excited about.

    Jolyon
    • Re:Slightly Dodgy (Score:3, Interesting)

      by kris_lang (466170)
      I've seen similar errors made by vision science (note that I did not say "image processing") researchers trying to analyze natural scene statistics and come up with interesting patterns. They created "basis functions" and did principal component analysis on sets of images and came up with a basis set that looks curiously like the base images of the DCT (discrete cosine transform), the underlying calculations of the JPEG image format. This is to be expected when you start with a set of images that are JPEG
    • You didn't read the article carefully enough. The seventh paragraph of the article clearly states they used TIFF images, not JPEG.
  • <p>This is not surprising at all really. Gzip and other compression utilities can be used to get upper bound for real/nonredundant information content.</p>

    <p>I'm not sure if above is public knowledge, but I have used it as a one additional feature for certain pattern recognition tasks for a while.</p>
  • by KingRamsis (595828) <(moc.liamg) (ta) (sismargnik)> on Monday March 31, 2003 @11:20AM (#5631279)
    It was an interesting coffee break discussion with one of my professors, we were arguing if there is neat way to estimate the semantic content of a neural network after training it, I recall suggesting to compress the value of the weights of all layers and the less compressible the more this neural network is trained.
  • Pattern Recognition (Score:3, Interesting)

    by cyber_rigger (527103) on Monday March 31, 2003 @11:30AM (#5631330) Homepage Journal

    I envision a whole array of compression algorithms.

    Each algorithm could be fine tuned for a paticular type of pattern.

    Is that an elephant or a giraffe?
    Does it compress better with the elephant algorithm or the giraffe algorithm?
  • I doubt this is very accurate for marking photos as hits or misses directly. This kind of thing may be useful more for detecting the lack of life rather than the presence of it. If compression rates are low, maybe you don't have to look at this photo so much. If they're high, maybe you want to examine it more closely. If you're dealing with truck loads of data and you're looking for a needle in a haystack, a mechanism for ruling out uninteresting data is invaluable.

    That having been said, it sounds good
  • viruses? (Score:2, Interesting)

    I wonder if viruses (sorry - didn't RTFA) would compress like living life forms or if they would be more similar to nonliving.

    Just a thought.
  • This gives new meaning to the phrase:

    "Honey do I look fat in this?" Put on Gzip glasses. "Of course not dear."
  • by TheSHAD0W (258774) on Monday March 31, 2003 @11:52AM (#5631458) Homepage
    There are other techniques for measuring the level of chaos in a set of data, and they'd probably yield more consistent results than running the data through an algorithm meant for an entirely different purpose.
  • by IDigUNIX (544392) on Monday March 31, 2003 @12:05PM (#5631510)
    As alternative to this hypothesis consider:
    feed a business technology proposal through gzip
    • A very high compression ratio indicates that the proposal was likely to be written by consultants. As supported by the fact that they usually re-use the same buzz phrases over and over.
    • A moderate compression ratio indicates that the proposal was written by engineers. Typically they use large words, and unique phrases that are already compressed. I.E. SNMP, J2EE, WWW, and so on.
    • A zero to negative compression ratio indicates that the proposal was likely to be written by a PHB, and hence void of all indications of intelligent life. As evidenced by most PHB's having a hard time using buzz phrases and keywords in context, so they won't recycle enough words to form a good compression dictionary.
  • Come on, admit it, who's tried to upload the GoatSe man, and TubGirl? ;) Are they organic? Just, I'd say.

    But seriously, I wonder what weird pics people have uploaded :)

  • Sounds like someone at NASA got a little carried away with their new toy [slashdot.org]
  • It would seem that the same approach could be used to distinguish potential intelligent radio signals from those of random or astronomical origin. Though perhaps you would want a pattern to be present resulting in a more compressible file? I think it would depend whether the signal that is picked up is a deliberate simple pattern meant to be a "hello, are you out there?" broadcast by an E.T [amazon.com], or if it is normal communications between E.T.'s not realizing (or not concerned) that they are being overheard.
  • In the most general form possible: life decreases physical entropy, which leads to ordered images, which means a decrease in informational entropy. Therefore, life processes produce images with less information than nonbiologically produced images. Less information takes less space, so the biological images should compress better. This is all in the abstract; in reality your results will depend on what algorithms you use.

    Someone pointed out that using JPEGs as source images is tainting the results. That's

  • I used a technique like this to do a web cam way back in 1997 before web cams were an easy thing to do. I was supporting Silicon Graphics workstations at the time. One of the models came with a digital camera. The cameras did not have automatic exposure.

    Using CGI as the user hit the web page it took pictures at different shutter speeds. Working up from the slowest shutter speed the first JPG over 20K bytes was the right exposure and was shown on the page.

  • by Anonvmous Coward (589068) on Monday March 31, 2003 @01:39PM (#5631974)
    "This extends the simple, but powerful, uses of gzip to biogenic fossil detectors..."

    The problem with gzip is that doesn't preserve data very well. Now tar, it preserves fossil data quite well.
  • ...and what seemed like an exhausting list of others was covered in lengthy detail in Stephen Wolfram's "A New Kind Of Science" [wolframscience.com] (IIRC, in the section on perception and analysis - but referred to in myriad ways throughout the volume).

    Now, I know I will either be flamed or derided for bringing up the mention of this text. I don't claim to be an expert on it (in fact, the scope and breadth of the reading convinced me that one time through is no where near enough - I will probably re-read it several more times

  • Otherwise, you'll just get a worthless puddle of protoplasm when you uncompress people on the other end of the teleport. Also, don't compress humans and insects together, altough you might get a better ratio that way.
  • Did a test run with some default images in windows xp. Windows XP's "Purple Flower.jpg" is apparently more "alive" than Windows Xp's "Tulips.jpg" but "Windows XP.jpg" is more alive than both of them!
  • Long ago engineers learned that nature evolves life to adapt to it's environment. We often look to nature for inspiration.

    What drives nature? Survival for one. Survival often depends on having a backup so it's no surprise that nature tends to adapt redundant systems.

    I have 2 hands in which I can hold a spear or club. I have 2 eyes to identify my potential predators. I have 2 ears to hear them coming. I have two...well you get the picture.

    All of this is externally visible. As such, the a jpeg or gi

If money can't buy happiness, I guess you'll just have to rent it.

Working...