Forgot your password?
typodupeerror
Math Science

Extreme Complexity of Scientific Data Driving New Math Techniques 107

Posted by Soulskill
from the how-do-you-process-twelve-billion-data-points dept.
An anonymous reader writes "According to Wired, 'Today's big data is noisy, unstructured, and dynamic rather than static. It may also be corrupted or incomplete. ... researchers need new mathematical tools in order to glean useful information from the data sets. "Either you need a more sophisticated way to translate it into vectors, or you need to come up with a more generalized way of analyzing it," [Mathematician Jesse Johnson] said. One such new math tool is described later: "... a mathematician at Stanford University, and his then-postdoc ... were fiddling with a badly mangled image on his computer ... They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. What appeared on his computer screen instead was a perfectly rendered image. Candes compares the unlikeliness of the result to being given just the first three digits of a 10-digit bank account number, and correctly guessing the remaining seven digits. But it wasn't a fluke. The same thing happened when he applied the same technique to other incomplete images. The key to the technique's success is a concept known as sparsity, which usually denotes an image's complexity, or lack thereof. It's a mathematical version of Occam's razor: While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit. Out of this serendipitous discovery, compressed sensing was born.'"
This discussion has been archived. No new comments can be posted.

Extreme Complexity of Scientific Data Driving New Math Techniques

Comments Filter:
  • ...but I don't think I'd want my doctor working from a "fuzzy logic" MRI if I had (God forbid) a BRAIN TUMOR or something...

    • by almitydave (2452422) on Friday October 11, 2013 @06:22PM (#45105337)

      Yeah, my doctor couldn't see enough detail in my head x-ray, so he used Photoshop's "content-aware fill" to fix it, and now apparently I need surgery to remove the 3rd half of my brain. I get to keep the 2 extra eyeballs, though.

      (actually, I really really want to see that applied to medical x-rays)

      • by timeOday (582209)
        Have you ever played with the compression level on jpg? At some point, enough is enough. Now instead of lossy compression, imagine we're talking about how much radiation to shoot into your nads to get a clean xray. There are diminishing returns on image quality for each doubling of the radiation. Are you still so sure you want to turn it up to 11?
        • by icebike (68054)

          Have you ever played with the compression level on jpg? At some point, enough is enough. Now instead of lossy compression, imagine we're talking about how much radiation to shoot into your nads to get a clean xray. There are diminishing returns on image quality for each doubling of the radiation. Are you still so sure you want to turn it up to 11?

          Enough is not where we are at.
          The point is that THAT point hasn't been reached.

          When looking at a highly compressed image of a person's face, you still recognize the face, and there is no reason to speculate about a Nuclear Aircraft Carrier floating on the film of tears on the eyeball. Similarly, when looking at an MRI where you were only able to get a partial image, there is no reason to assume a third eye will somehow be missed in any brain scan that carries detail finer than an eye.

          Doctors aren't total

          • by TheLink (130905)

            Is there a comparison between a partial scan with this processing and a full scan?

        • As far as X-rays are concerned, perhaps developing better contrast agents would be a better solution to get pictures sharper where you need it.
    • by lgw (121541) on Friday October 11, 2013 @06:25PM (#45105345) Journal

      OF course it works. "Zoom! Enhance!" If TV hasn't taught me that "enhance" works reliably, then TV has taught me nothing.

      • by Anonymous Coward

        Too true. And don't forget, the zoom is infinite and the enhance takes a muddy, motion-smeared blob and gives you a professional quality portrait. "Look--the thief has dandruff!"

        • by icebike (68054)

          Enhance 224 to 176. Enhance, stop. Move in, stop. Pull out, track right, stop. Center in, pull back. Stop. Track 45 right. Stop. Center and stop. Enhance 34 to 36. Pan right and pull back. Stop. Enhance 34 to 46. Pull back. Wait a minute, go right, stop. Enhance 57 to 19. Track 45 left. Stop. Enhance 15 to 23. Give me a hard copy right there.

      • by jrumney (197329)

        Basically, this is an algorithm that, given a fuzzy image with some people looking shapes in it, produces an image with stick figures in their place. "Hey it produced a perfect match" you say, but actually you don't have a non-fuzzy image for comparison, so you don't realise that the one at the back chasing the others is actually Sasquatch, and as any child knows the stick figure match for Sasquatch needs more jagged lines..

    • by mmell (832646)
      Hey, if that's all they have to work with (given that current imaging technology is not up to the standard found aboard a Federation starship) - unless you'd rather your surgeon used a divining rod? A surgeon should be aware of the difference between a raw and an enhanced image, and I'm pretty sure that some data is better than none.
    • by Artifakt (700173)

      ... but I don't think I want a military intelligence specialist who has been ordered to find weapons of mass distruction on satellite photos working them over with this sort of software either ...

      • by timeOday (582209)
        Let's say the analyst has to search a huge area for mobile launchers, and the imagery comes from a satellite with finite bandwidth. Would you rather he search good-quality images of the whole area, or fantastic-quality images of a tiny fraction of the area?
        • by icebike (68054)

          As long as the minimal resolution was sufficient to allow the software to to deduce a launcher in spite of drop-out, clearly the most useful image would cover the area just large enough so that the software could deduce the existence of a launcher by running the above mentioned algorithm. There would be no point in having resolution sufficient to read a license plate, when what you are looking for is 30 feet tall, and has a known shape.

          The thing is, people using this imaging technique have to know what the

          • "Checked. Still no weapons of mass destruction."
            "Damnit... switch to a lower resolution and try again!"

      • by Arkh89 (2870391)

        That is NOT the way to understand these sets of techniques. Candes, Tao and Donoho's works are basically about saying : what is the minimum number of measurements that I have to do to make sure that the reconstruction of the signal will be sufficient (for a given task), assuming that the signal has some known properties?

        Let's say you hear the sound of horseshoes while walking in a street, if I ask you what is the color of the coat of the animal, you won't probably start by saying "red" or "blue". This is be

    • by nashv (1479253)

      Yeah, because the knowledge the doctor is using to diagnose your brain tumour by eye is completely determinstic, right? Because that how human brains work huh...

    • by paskie (539112)

      So what do you think the doctor works from now?

    • by cellocgw (617879)

      but I don't think I'd want my doctor working from a "fuzzy logic" MRI if I had (God forbid) a BRAIN TUMOR or something...

      Then I got bad news for you: NMR imaging and CAT imaging depend on algorithms with names like "Maximimum A Priori Likelihood Estimation." They *all* depend on making the best bet as to what the reconstructed image should be. It just turns out (thanks to that thing called mathematical statistics) that the correct solution is overwhelmingly positive. "Fuzzy Logic" does not mean what I think you think it means, i.e. "some random drunk posting to /."

    • by RockDoctor (15477)
      As TFS says,

      [images] such as the ones generated by MRIs when there is insufficient time to complete a scan.

      (my emphasis)

      So, you're in one of two situations : you've got an acute problem - suffocation, massive bleeding, something really, really time-critical - and if they don't stop the MRI now and do something else now, then you're dead meat ; or, there has been some mechanical or financial issue with the machinery and someone is trying to save money by not re-doing the scan. In the one case, you've got

  • by Anonymous Coward

    Ya, know, filter out all the noise and controversy, and get an output which is purely liberal computer geek rants all in unified, biased agreement. That would be amazing!

    • by AK Marc (707885)
      How is it that all the conservative complain it's all liberals, and the liberals claim it's all libertarians? How can it be "all" groups at the same time?
      • Given the overall percentage of libertarians (1%?) and the overall percentage of liberals (48%?), clearly it isn't anywhere near "all libertarians". This proves that:

        The liberals are completely wrong.

        That's the only conclusion that can be drawn by anyone who can follow simple logic. People who can follow simple logic knew that already, though.

        I'm KIDDING you hyper-sensisitive liberal weenie who is furiously clicking the "reply" button. Sometimes liberals are right, even Obama. Obama was right when he s

    • by WillKemp (1338605)

      If we applied it to tfa, there wouldn't be anything left.

  • by Anonymous Coward on Friday October 11, 2013 @06:09PM (#45105265)

    For fuck's sake.

    These techniques of dealing with incomplete and unstructured data have existed for decades.

    AI researches hyping absolutely everything about their field to get some funding is starting to get on my nerves.

    • by WillKemp (1338605)

      Yep. Just more of the usual "big data" bullshit hype. The sort of nonsense you'd expect from Weird mag.

  • by Anonymous Coward

    Make assumptions

  • by ZeroPly (881915) on Friday October 11, 2013 @06:11PM (#45105275)
    "They were trying to find a method for improving fuzzy images, such as the ones generated by MRIs when there is insufficient time to complete a scan. On a hunch, Candes applied an algorithm designed to clean up fuzzy images,[...]"

    Wow! That would be the last thing I thought of in that situation...
    • by Anonymous Coward

      Yup, quite the intuitive leap. I hope the Nobel committee knows how to reach him.

      • Re: (Score:2, Funny)

        by Anonymous Coward

        They were trying to reach him to talk to him. Oh a hunch, the Nobel committee applied a phone designed to reach people.

        • by Anonymous Coward on Friday October 11, 2013 @11:28PM (#45106767)

          But it's even more amazing than that.

          The Nobel committee only had the first three digits of his phone (the area code), so they applied the same algorithm, and bam! Turns out it works just as well for phone numbers.

          They got him on the first ring too. But that part is just coincidence.

  • by raymorris (2726007) on Friday October 11, 2013 @06:15PM (#45105303)

    While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit."

    Of the millions of possibilities, the sparsest is MOST likely. Perhaps it's twice as likely as any other possibility. That still means it's 99.999% likely to be wrong.

    As for the MRI, that fuzzy part is probably noise that can be deleted, except when it's a tumor.

    "

    • by whit3 (318913)

      While there may be millions of possible reconstructions for a fuzzy, ill-defined image, the simplest (sparsest) version is probably the best fit."

      Of the millions of possibilities, the sparsest is MOST likely. Perhaps it's twice as likely as any other possibility. That still means it's 99.999% likely to be wrong

      I interpreted this to be a description of maximum-entropy filtering (i.e. making an output image with least information, consistent with input image with sparse information content overlaid with full-

  • by Vesvvi (1501135) on Friday October 11, 2013 @06:20PM (#45105329)

    I like some of the more subtle details in the title and summary: new math "techniques", "researchers need new mathematical tools", etc.

    I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is. But it is true that existing knowledge and tools from mathematics drive huge advances in the sciences when they are brought to bear. The sad truth is that scientists just don't play terribly well with others (maybe no one does): interdisciplinary work is rare and difficult, and so we end up re-inventing the wheel over and over again. The reality is that the "wheel" being created by the biologist in order to interpret their data is a poor copy of the one already understood by the physicist across campus.

    What can we do about this? I'm not sure, but I think it's safe to say that our greatest scientific advances in the next few decades will be the result of novel collaborations, and not novel math or (strictly speaking) novel science.

    • by JanneM (7445) on Friday October 11, 2013 @07:42PM (#45105781) Homepage

      I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is.

      This has actually always been the norm. Physics has long driven mathematics research for instance; many areas of calculus were created/discovered specifically to solve problems in physics.

    • by Anonymous Coward

      I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is.

      Sir, may I introduce you to the field of partial differential equations? I think you would find it absolutely fascinating!

    • I like some of the more subtle details in the title and summary: new math "techniques", "researchers need new mathematical tools", etc.

      The summary isn't too inaccurate; what they are talking about is compressed sensing https://en.wikipedia.org/wiki/Compressed_sensing [wikipedia.org], i.e., the search for sparse (as in: with few nonzero elements) solutions to underdetermined systems of nonlinear equations. "Sparse" is understood in suitable basis, so for instance for a sound it could mean few different frequencies. The pr

      • by epine (68316)

        The problem in itself is NP-hard, but it turns out that in some cases of interest

        Perfect solutions are often NP-hard in systems where pretty-good solutions are nowhere close to NP-hard in many practical circumstances.

        The declaration of NP-hard is way overrated. We use it mostly because mathematics still can't chew "pretty good" in any rigorous way.

    • I find it hard to believe that our sciences are driving the math fields, as mature and well-developed as the math community is.

      Reminds me of the anecdote about Prof. Rota when he was asked by a reporter why MIT didn't really have an applied math department. He responded, "We do! It's all of those other departments!"

  • Loading a MRI in to Photoshop and using a sharpening tool- novel concept!

  • The harder you have to dig.
  • What is Big Data? They say it is when your problem grows faster than your resources.

    Yet, since the 70's we have the concept of NP-Hard: again your problem grows faster than your resources. We have always had "Big Data."

    • by godrik (1287354)

      "Big Data" is actually a pretty clear problem. It is not when your problems grows faster than your ressources. It is when you are faced with processing massive amount of unstructured data that flow in your system. The data might be untrustable or forged or incomplete. You might want to read on the "Vs of Big Data", it describe pretty well the type of problem it encompasses. Obvisouly not everybody faces such difficult to process data.

      Essentially it is a big word you put to describe most modern data analytic

      • My whole research experience has been of large, noisy data which is processed with approximation algorithms for NP-Hard problems, so perhaps I am making too broad generalizations about the problems faced by most computer scientists. The hype is still inane given, say, the three decades-long efforts to characterize all protein structures.

        And it's true that there are more, more open data repositories than previously, and many fields of science have adopted submission to central, public repositories as a norma

        • by godrik (1287354)

          I am all with you on this. I was mostly designing approxiamtion algorithm for weird-ass scheduling problem in the past. Recently I have been hired by a university to conduct research on "Big Data" and the problems faced have little to do with Combinatorial Optimization anymore.

          The problem is more on the lines of: Here are the medical record from 20 hospital on the east coast. You got everything: patient files, radios, MRI, blood test, cardios, doctor notes, nurses notes (as text files or images). Your job i

  • Anyone who manage to invent a new methods for analyzing such a big data will get job...you guess where....
    • Unless your college roommate's father was a former Chinese national. Two degrees from potential espionage? No job.

  • by Fubari (196373) on Friday October 11, 2013 @07:36PM (#45105741)
    This! is the kind of article I joined slashdot to find out about.
    I wish there was a way to mod actual articles +1 or -1 instead of just modding comments; or to at least toss the submitter a karma point or something.
    • by gl4ss (559668)

      go to firehose.

      on the other hand, I would have liked to see actual pictures fixed by his algorithm.

      because without those, the article feels like bullshit. there's even a video in the article. but no "hey here's a csi zoom of this pic x".

      or maybe he has different definition for "perfectly rendered".

      • turns out on the professor's web page Emmanuel Candes [stanford.edu], there is a link to Some old talks [stanford.edu] that shows an example of the kind of transforms / cleanup they're talking about (they're lengthy PDFs, but worth skimming if you're curious about the kidns of images). Nothing like real world pictures; synthetic examples with some shapes (almost like something you could mock up with MS Paint), but the premise is rather interesting.
        And I just saw this like on the Candes web page above: this does have some interesting m
  • by Anonymous Coward

    Could someone point at some pictures he tried to clean up with these algorithms, and how they got cleaned up, and what existing picture manipulation tools use the same algorithms? And saying what the algorithms were would be good too.

    • by anubi (640541)
      Anyone have a pointer to the algorithm? I am suspecting some matrix operator that looks at each pixel and its neighbors.

      To me, there seems to be plenty if information on recorded video, as it contains previous as well as future frames that should contain sufficient information to provide considerable clarification of a present image frame. Anyone have info on anyone doing this?
      • To me, there seems to be plenty if information on recorded video, as it contains previous as well as future frames that should contain sufficient information to provide considerable clarification of a present image frame. Anyone have info on anyone doing this?

        This is used already in multi-frame superresolution [wikipedia.org]. TFS seems to be talking about compressive sensing, which is a completely different beast. Compressive sensing is based on assuming sparseness to solve an underdetermined system of linear equations. It doesn't always work (as it's not always a valid assumption), but when it does you can get very impressive results. That is to say, if you have some underdetermined system of equations, it'll have infinite possible solutions. This obviously doesn't lend itsel

  • by key45 (706152) on Friday October 11, 2013 @07:52PM (#45105849) Journal
    4 years ago, Slashdot ran this exact same story http://science.slashdot.org/story/10/03/02/0242224/recovering-data-from-noise [slashdot.org] about Wired running this exact same story: http://www.wired.com/magazine/2010/02/ff_algorithm/all/1 [wired.com]
  • informercial (Score:5, Insightful)

    by stenvar (2789879) on Friday October 11, 2013 @08:11PM (#45105969)

    The whole article is just a sales job:

    That is the basis of the proprietary technology Carlsson offers through his start-up venture, Ayasdi, which produces a compressed representation of high dimensional data in smaller bits, similar to a map of London’s tube system.

    The first place to look when people make such claims is at their publications, neither Gunnar Carlsson nor Simon DeDeo have significant publications that show that their approach works on real data or standard test sets. The statements in the article that these kinds of approaches are new are also bogus (I don't know whether they are deceptive or ignorant).

    Lastly, from a Stanford math professor, I would expect better citation statistics overall; I don't know what's going on there.

    http://scholar.google.de/citations?user=nCGwiu0AAAAJ&hl=en [google.de]

    http://scholar.google.de/scholar?as_ylo=2009&q=author:%22gunnar+carlsson%22&hl=en&as_sdt=0,5 [google.de]

    • by RandCraw (1047302)

      CompTop: Applied and Computational Algebraic Topology
      http://comptop.stanford.edu/ [stanford.edu]

      You need to do more than, "Google: I feel lucky".

      • by stenvar (2789879)

        How does a project web page make up for the lack of relevant peer reviewed publications or lack of citations?

        Where are the published results on real-world data sets? Or do you believe that a lot of verbiage is sufficient?

        • by RandCraw (1047302)

          And you're claiming the work is invalid because you're unimpressed by the lack of pubs of a new research program. At Stanford?

          In short, the Wired article is interesting while your criticism adds nothing. My advice, FWIW: if you must criticize, be specific. Don't gainsay with, "Your work is uninteresting because I'm unconvinced."

          That makes you sound like a Creationist.

    • Re: (Score:3, Informative)

      by Anonymous Coward

      What are you smoking? 1877 citations since 2008 isn't a good citation statistic? More importantly, judging someone's research value by absolute citation statistic is quite silly; he is a full Stanford Professor for his accomplishments, intellect, and personality (I hear he is a good advisor).

      While the article is quite a promotional piece, you don't know much about the field. Gunnar Carlsson and his group have advanced computational topology moreso than any other. He came up with the concept and way to compu

    • by Anonymous Coward

      There's plenty of publications if you know where to look.

      Most of these papers are published in peer review journals: http://comptop.stanford.edu/preprints/
      Robert Ghrist's publications are another good source of material: http://www.math.upenn.edu/~ghrist/preprints.html
      and then there's John Harer: http://fds.duke.edu/db/aas/math/faculty/john.harer/publications.html
      and Vin de Silva: http://pages.pomona.edu/~vds04747/public/publications.html
      and Leo Guibas has done lots of computational topology: http://geometr

    • by lorinc (2470890)

      Seriously? You are judging the work of someone solely on the shape of his citation curve on scholar? Compressed sensing is all but new, right, but selling these guys as bad because they have only 2k citations (!!!) on scholar is a bit exagerated, to say the least.

      • by stenvar (2789879)

        Seriously? You are judging the work of someone solely on the shape of his citation curve on scholar?

        No, I'm judging the work by the absence of relevant peer reviewed, high impact publications, and the lack of experiments on large data sets.

        (My comment on the relatively low h-index was merely an aside.)

  • ... CSI Logic [memecenter.com].

  • Just plain lame. Nice marketing graphics though. What the hell is this crap doing on /.?
  • I applaud the work, seriously. But in some departments of mathematics, statisticians are referred to in the same breath as politicians and liars. I'm not calling the OP a liar, but generalization can lead to incorrect conclusions. Unless it's lupus of course.
  • At the risk of repeating Lord Kelvin's folly: science is almost over, that's the root of the "data" problem. Data is so complex, because we exhausted simple systems, and we are trying to tackle irreducible systems.

    It's a fallacy.

  • If you love trusting these kinds of compression algorithms, I have a Xerox machine [slashdot.org] I'd like to introduce you to...
  • Cmoon If you say smething like: "On a hunch, Candes applied an algorithm designed to clean up fuzzy images, expecting to see a slight improvement. .. What appeared on his computer screen instead was a perfectly rendered image...", your readers (at least me) expect to see this image and to hear about hunch-algorithm. Anyway, here (many clicks later) is original article - https://www.simonsfoundation.org/quanta/20131004-the-mathematical-shape-of-things-to-come/ [simonsfoundation.org] (but still no pics)

New crypt. See /usr/news/crypt.

Working...