Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Science Technology

Extracting Audio From Visual Information 142

rtoz writes Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. In one set of experiments, they were able to recover intelligible speech from the vibrations of a potato-chip bag (video) photographed from 15 feet away through soundproof glass.
This discussion has been archived. No new comments can be posted.

Extracting Audio From Visual Information

Comments Filter:
  • Not surprising (Score:5, Insightful)

    by Z00L00K ( 682162 ) on Monday August 04, 2014 @09:42AM (#47599079) Homepage Journal

    Measuring the vibrations of windows or other items was used already 40 to 50 years ago by spy agencies, so I wonder if this isn't something that has been re-discovered?

  • Re:Not surprising (Score:5, Insightful)

    by Hamsterdan ( 815291 ) on Monday August 04, 2014 @10:00AM (#47599181)

    The countermeasure for laser listening was to install the windows inside a pipe *frame* and play music in the pipes. Using an object inside the building to extract audio defeats that countermeasure. This is 2014, do not expect any privacy, especially from government agencies...

  • by JackieBrown ( 987087 ) on Monday August 04, 2014 @10:07AM (#47599223)

    The hat is a trick!

    The reason they want you to wear foil is so that the sound can bounce off it.

  • Re:Not surprising (Score:5, Insightful)

    by timeOday ( 582209 ) on Monday August 04, 2014 @10:08AM (#47599233)
    Well, even a normal microphone is "just" measuring the linear displacement of a membrane over time, so clearly the important distinction is how you measure it. A laser range-finder is different from a microphone, and a video camera is different from a laser range-finder.
  • by interiot ( 50685 ) on Monday August 04, 2014 @10:10AM (#47599241) Homepage
    30 Hz is far below the Nyquist rate [wikipedia.org] (6800 Hz, going by POTS specs), so no, that wouldn't be possible without some fundamental changes in our understanding of information theory and physics.
  • by sunderland56 ( 621843 ) on Monday August 04, 2014 @10:13AM (#47599267)

    reconstruct from standard 30 fps video

    Dear sir: what you are asking is impossible.

    Sincerely yours,

    Harry Nyquist

  • by fuzzyfuzzyfungus ( 1223518 ) on Monday August 04, 2014 @10:37AM (#47599495) Journal
    Worse than that. If there's a metal foil involved, vibration measurement should be doable with RF as well as light. Only with a next generation reduced radar cross section geometries and RF absorbent materials can a truly secure tinfoil hat be constructed.

    Unfortunately, walking around with what appears to be a small F-117 attached to your head offers limited visual camouflage potential and may prove counterproductive in your attempts to avoid Their surveillance.
  • by tepples ( 727027 ) <tepples.gmail@com> on Monday August 04, 2014 @11:30AM (#47599961) Homepage Journal
    In theory, if you can find different targets in the frame with resonant frequencies spaced no more than 15 Hz apart, you can read a different 15 Hz off each target.
  • by Anonymous Coward on Monday August 04, 2014 @11:32AM (#47599989)

    Oh dear. You even linked to Wikipedia (although not to the Wikipedia page "Nyquist Rate"). Does it not occur to you that OP understands those things better than you do?

    To start with you need to understand what the Nyquist rate means. Sampling is like wrapping a signal around a cylinder. Just because parts are overlaid ("aliasing") doesn't mean you can't untangle the original signal. For instance, if a single audio source contains only pure harmonics, so the frequencies are known to be N, 2N, 3N, 4N, and so on, and if you have the range of possible N down to a smallish range (e.g. you know it's a voice) and you know that higher harmonics are always smaller than lower harmonics, then you can, from a massively sub-Nyquist sampling like this, extract both N *and* all the coefficients of all the harmonics. It's just like determining the dimensions of a triangle after it's wrapped around a cylinder. No, the triangle doesn't have to fit within one revolution of the cylinder, that's just the trivial case that obviously works.

    What OP is proposing is that because different parts of the physical system have different resonances, when you look at that part of the image you are seeing a strongly filtered version of the original signal - basically a single frequency. You can measure the size of this signal using an aliased sampling - there's no problem with that whatsoever, it just works, an aliased sampling has the same energy as a non-aliased sampling, the samples are just in a different order. Then if you know different image areas have different responses, you can build up an image of the signal by patchwork. It would be a bloody hard job for a crisp packet in arbitrary configuration, but if you get to design the object you're looking at you can make this as sensitive as you like, and even use really crappy cameras to do it.

    Nyquist rate isn't the be-all and end-all people think it is, it's just a limit for *perfect* reconstruction of *arbitrary* signals. The naive approach is to restrict yourself to sub-Nyquist signals and use the easy algorithms everybody knows. The fun stuff (read: the stuff you might get paid for) involves at least flirting with the Nyquist range, or even fully embracing that aliasing is happening and figuring out the consequences from first principles. Once you do this, you can do amazing things that seem impossible to Signal Processing 101 students ... the only problem then is you get SP101 students telling you you're an idiot for thinking that's possible. Oh, well.

    BTW, sampling rate on telephony is 8000Hz as standard. Pro-tip: if you want to sound like a signal processing expert, know common sample rates.

"Engineering without management is art." -- Jeff Johnson

Working...