Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Media Science Technology

Scientists Use Camera With Human-Like Vision To Capture 5,400 FPS Video (petapixel.com) 66

An anonymous reader quotes a report from PetaPixel: A team of scientists from the Swiss Federal Institute of Technology in Zurich (ETH Zurich) have figured out how to capture super slow-motion footage using what's called an "Event Camera." That is: a camera that sees the world in a continuous stream of information, the way humans do. Regular cameras work by capturing discrete frames, recapturing the same scene 24 or more times per second and then stitching it together to create a video. Event cameras are different. They capture "pixel-level brightness changes" as they happen, basically recording each individual light "event" as it happens, without wasting time capturing all the stuff that remains the same frame by frame.

As ETH Zurich explains, some of the advantages of this type of image capture is "a very high dynamic range, no motion blur, and a latency in the order of microseconds." The downside is that there's no easy way to process the resulting "footage" into something you can display using current algorithms because they all expect to receive a set of discrete frames. Well, there was no easy way. This is what the folks at ETH Zurich just improved upon, developing a reconstruction model that can interpret the footage to the tune of 5,000+ frames per second. The results are astounding: a 20% increase in the reconstructed image quality over any model that existed before, and the ability to output "high frame rate videos (more than 5,000 frames per second) of high-speed phenomena (e.g. a bullet hitting an object)," even in high dynamic range "challenging lighting conditions."
Their findings have been published in a research paper titled High Speed and High Dynamic Range Video with an Event Camera.
This discussion has been archived. No new comments can be posted.

Scientists Use Camera With Human-Like Vision To Capture 5,400 FPS Video

Comments Filter:
  • by Rei ( 128717 ) on Friday July 12, 2019 @09:08AM (#58913080) Homepage

    ... several times in Slashdot over the years (most recently, here [slashdot.org]). Great to see it actually getting some research! Now I actually have a proper term ("Event Camera") for it. :)

    • by Rei ( 128717 ) on Friday July 12, 2019 @09:26AM (#58913164) Homepage

      I'll note that I do envision this going further (as I've written in the past). A particularly important one would be recording data in absolute polar coordinates, along with a path for the camera over time, so that rotation of the camera does not spam you with new data for each pixel. E.g. if column 1 of your "CCD" corresponds to 5,86 degrees east of due north, and the camera is rotated so that column 2 of the "CCD" now corresponds to to 5,86 degrees east of due north, all that changes is that column 2 is now storing data to where column 1 was storing it previously. You only want things that actually change in the scene to trigger refresh events. Ideally the metadata would allow multiple camera sources to be included into the same scene, which would aid in stereoscopy or true 3d reconstruction. Additionally, datastreams wouldn't be strictly confined to "RGB", but would allow for whatever spectral windows the camera wants to record.

      As described previously, compression becomes trickier with an event camera. Realistically you want A) to record changes in light intensity as splines, and only trigger a new datapoint when a single spline can no longer accurately represent the light intensity curve that's been observed since your last datapoint; B) trigger new datapoints for adjacent pixels that "nearly need to switch splines", so that you can write out data in blocks, thus reducing per-pixel overhead; and C) bundle as many blocks into a single time writeout as possible, thus reducing per-timestamp overhead. There are tradeoffs, of course - (B) for example creates datapoints sooner than they might be needed, while (C) reduces your temporal resolution (your worst case being "everything written all at once", aka, frames).

      Realistically, CCDs are poorly suited for an event camera, as they read out everything in rows, all pixels at once. You really want an event-driven paradigm in the sensor hardware itself - the hardware itself accumulating light-curve splines and triggering write-out events when the deviation of the light curve can no longer be represented with a single spline. Likewise, for compression, you want datapoint write-outs from one pixel to trigger new datapoint write-outs from blocks of their neighbors that are almost ready, and potentially from nearly-ready non-adjacent blocks as well - all at the hardware level. You want your bus to simply listen for events from your sensor, bundle together all newly-read data into a compressed format with all required metadata, and to write them out.

      • A particularly important one would be recording data in absolute polar coordinates, along with a path for the camera over time, so that rotation of the camera does not spam you with new data for each pixel. E.g. if column 1 of your "CCD" corresponds to 5,86 degrees east of due north, and the camera is rotated so that column 2 of the "CCD" now corresponds to to 5,86 degrees east of due north, all that changes is that column 2 is now storing data to where column 1 was storing it previously.

        If you're putting that level of smarts into the sensor itself, then maybe consider following the human system again and perform a certain amount of image processing/recognition at that level before passing the data down to the controller? Would probably help with your compression problems while you're at it.

        • by Rei ( 128717 )

          Indeed, a sort of "pass your data to the next pixel over" event (which would be triggered in response to rotation of the camera) is not only possible to implement in hardware, it's what CCDs already do [wikipedia.org] in order to readout. :)

      • by Rei ( 128717 )

        In short, in a way, what's being discussed is rather like a hardware neural net, where you have activation potentials, and a "synapse" from one pixel can trigger "synapses" from those that it's connected to. In the analogy, "frequently synapsing pixels" correspond to those whose light data is undergoing significant, unpredictable change (and some of those adjacent to them), while "seldom synapsing pixels" correspond to those in areas that are seeing little change to what they're recording.

      • You may wish to take a look at DFPA technology [mit.edu]: these have an ADC (and sometimes additional flexible processing) under each pixel, streaming photon counts and timestamps.

        Also related are APD arrays [nasa.gov]; a few dozen pixels [first-sensor.com] that are typically individually queried at GHz rates. Useful for low-light applications like LIDAR, where they simply record the timestamp of incoming photons. Postprocessing then converts that to a ranged image. Using this can generate things [mit.edu] like large-scale 3D maps of an area [mit.edu].

    • by HiThere ( 15173 )

      IIUC, from the summary, they didn't invent the "event camera", they invented a way to process the output to make it useful.

    • by Njovich ( 553857 )

      That's great Rei. If you also invent a time machine, you could patent the event camera before the first commercially available ones hit the market in 2008: thttps://ieeexplore.ieee.org/document/4444573

    • by tomhath ( 637240 )
      I had a friend who proposed YouTube 10 years before YouTube was invented
  • by Anonymous Coward on Friday July 12, 2019 @09:19AM (#58913130)

    I can see this as a much better way to capture video. Basically every pixel is recorded as if it was an audio stream, and only the deltas are stored, with an extreme sampling rate, like SACD. Then compress away everything where nothing changes, using an algorithm that understands motion compensation based on per-pixel motion quaternions.

    But it would still be samples. Quantitzed in X, Y, time and brightness.
    So not continuous. But the next best thing: A wave function that goes through all the sample points, to interpolate a continuous wave above the nyquist frequency of human vision. At least in the time/brightness space.

  • this group in Zurich is also applying event cameras for obstacle avoidance in drones: IEEE Spectrum [ieee.org]
  • This is a magnificent breakthrough, and could potentially lead to significant improvements to video capture and photography in general. Since the problems are mostly on the interpretation side (figuring out what the output of the stream of pixel events reconstructs to), this is a tractable problem with current technology.

    Hopefully it really is a usable advance, and won't end up a technical footnote like the Lytro [wikipedia.org].

    • Re:Breakthrough (Score:4, Interesting)

      by Rei ( 128717 ) on Friday July 12, 2019 @09:48AM (#58913288) Homepage

      Light field data is still very cool, even though Lytro's focus on "mediocre cameras that let you refocus your pictures" was rather weak. It'd be particularly nice for 3d scene reconstructions (for example, for self-driving vehicles, drone navigation, etc). I'd hope that formats for event cameras be designed to allow for optional vector data to be associated with light events.

    • I have a feeling this will stick around. First, think of security cameras, all that you need for motion detection is a measurement of bandwidth used. Second, consider a drone flying along in a line, filming the ground looking for driving cars. If you can make the unchanging state equal to the moving ground, you can find the moving cars by bandwidth the same way. Third, what I just described, prediction of next frame with deviation, is how h264 and every video compression going forward works, so itâ(TM)

      • by Rei ( 128717 )

        Exactly. What I've always envisioned is that each pixel fits a spline to its light curve; data writeout is only triggered when it can no longer match its light curve to a single spline (or when forced to do a writeout by other pixels for compression reasons). A pixel one place in the scene may be getting writeouts triggered thousand times a second because everything is in chaos, while other parts of the frame sit around for seconds at a time with no writeout.

        And since you're writing out splines, you also ha

        • This reminds me of mp3 decoders that can output 24-bit values even if the original was recorded at 16 bits; the data was stored as sinusoid curves.

          A related idea that comes to mind is storing video as a 3D Fourier transform. I had this idea in 2002 while working on image recognition software, and I thought it would make it easy to balance between temporal and spatial precision automatically depending on the content. Later, I learned that Ogg Tarkin was being developed at the same time using the same idea

  • "a camera that sees the world in a continuous stream of information, the way humans do."

    Umm no. Unless its using entirely analogue electronics it'll have a microcontroller with firmware which ultimately ticks to a clock. It may be a very very fast clock but its still discrete time, not continuous.

  • A stream of only delta events of any material length will be difficult to work with (e.g., randomly access), and immensely fragile (e.g., once an errant event is sent and/or an event is lost/corrupted in transmission, the image will be corrupted for the foreseeable future). This is the same general reason (rational) incremental backup plans include periodic full snapshots, MPEG streams contain periodic I-frames, etc.

    Though I haven't had time to fully tear it apart, I didn't see this directly addressed in t

  • by Pollux ( 102520 ) <speter AT tedata DOT net DOT eg> on Friday July 12, 2019 @12:45PM (#58914226) Journal

    Regular cameras work by capturing discrete frames, recapturing the same scene 24 or more times per second and then stitching it together to create a video. Event cameras...basically recording each individual light "event" as it happens, without wasting time capturing all the stuff that remains the same frame by frame.

    Wow. That makes a lot of sense. And it's the same exact thought process John Carmack had when he tried to program Super Mario Bros. 3 for the PC [youtube.com].

    For those that don't know the story, the PC had really poor side-scroller games back in the 80s and early 90s, because graphic engines at the time rendered graphics frame-by-frame. Because of the slow nature of CPUs, and because they were both processing programming code AND video rendering, they couldn't do more than about 6 frames per second. Carmack came up with the novel idea of rendering graphics not by frame, but by the changes happening pixel-by-pixel. Since not every pixel required updating every frame, the CPU could render far more frames per second.

    As proof of concept, Carmack programmed a version of Super Mario Brothers 3, to show Nintendo that they could create a port of the game on the PC and boost their sales. Nintendo had no interest in moving their flagship game to a non-Nintendo console, and declined Carmack's offer. So, he instead made ID Software, and the rest is history. (A more thorough video explaining the history can be found here. [youtube.com])

  • ^

    Or 30 FPS. Depending on how retro your opinion is. lol

    • by Anonymous Coward

      >_ Or 30 FPS. Depending on how retro your opinion is. lol

      Or 24 FPS if you're even older... LOL... er, that means I'm old 8-[

      Anyway, there's more than just being able to see: I get sick after playing a little DOOM (that original DOS version, but IIRC the same occurs on Linux). Quake I is remarkably OK for me. I skipped some great games because of that (for the record, I've had motion sickness for my entire life... it does not bother me much when I drive, though).

      I can play Nexuiz but Xonotic makes me a li

      • by Hillie ( 63573 )

        Damn. This is the best AC comment I've ever read.

        I got a friend who told me that she can't see the difference between 30 and 60 fps. Even moving a mouse around..

        I was like Homer slowly backing away lol.

Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.

Working...