Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Graphics Software Science

Creating 3D Computer Graphics From 2D HDTV Camera 45

photon jockey writes: "Everyone knows that holograms are cool! But these three dimensional images are hard to make and need special conditions to view. A group from The University of Tokyo have taken a step toward 3D displays with this paper in Optics Express. Using a HDTV camera they effectively capture the light rays passing through a plane from a lit scene and then reconstruct the three dimensional geometry of the scene. Some pretty movies are available from the same page to show this. The paralax is limited by the size of the CCD and the distance to the object. From the paper: In the field of computer graphics (CG),the use of real images has been attracting attention as a method of attaining image synthesis that results in a more photo-realistic quality. This field of technology is called image-based rendering. The authors have attempted to solve this problem, by applying the approach of 3-D display technology, in which light rays are the most primitive elements of visual cues. "
This discussion has been archived. No new comments can be posted.

Creating 3D Computer Graphics From 2D HDTV Camera

Comments Filter:
  • I agree. Holograms (as three dimensional images) are VERY different from capture devices. However, this article refers to the construction of a hologram through the use of flat (2D) capture devices, such as standard/HDTV cameras. A more appropriate word would've been "3d modeling..." To be honest, I think they'd be more successful using sonar, or some other type of refractive/reflective slow imaging devices--it would be a lot easier to detect distances. Of course, I'm only perusing this article and its references lightly, so I haven't a clue as to the specifics of this particular technology. I do know that by emitting some kind of all-reflective wave (sound works, although it's not the best candidate), it is possible to detect distance between two points. Using this and all that we've learned about waves and their behaviors, one could simply PING a an object a few times from different angles, detect miniscule differences in the wave, and use such information to create a 3 dimensional model. Unfortunately, AFAIK, we don't have the technology to detect these miniscule wave differences and behaviors. Maybe in the year 2525...=P

    Again, sorry for the thread drift,
    treker
  • I am only dimly remembering this from an optics course at uni, but one way of understanding the process conceptually is that each of the point light sources of which the obect is composed generates its own Fresnel zone plate.
    I can't remember the maths of the Fresnel zone plate, but basically, if you interfere a point light source with a parallel light source, take the resulting pattern of light and dark rings and etch it onto a surface, light reflected off that surface is diffracted in such a way that it is focused, as if it has passed through a lens with its focal point at the same distance behind the plate as the point source was in front. This creates a virtual image of the original point source. You can view the object as being composed of such light sources each generating their own zone plate.
    Like I said, I can't remember the maths, and the extrapolation from one point source to a whole object isn't exactly mathematically trivial, but I remember that the reason the Fresnel zone plate worked was very straightforward to grasp conceptually and helped the holography concept to fall into place. I always preferred to have a way to visualise something like that even when the maths clearly works out fine.
  • While porn will be one of the first commercial applications of the technology, imagine what this will do if they can get it to pick up x-ray wavelengths, to say nothing of the benefits of a commercialy viable holographic system.
    This has been another useless post from....
  • by account_deleted ( 4530225 ) on Wednesday February 14, 2001 @05:30AM (#433168)
    Comment removed based on user account deletion
  • I visited NATPE [natpe.org] and talked to some people about HDTV and it looks like its starting to happen. Some channels are already broadcasting in japan.

    But even though the HDTV consortium contains 19 formats it looks like there is an emerging universal mastering format 1080/24P [discreet.com] 1980*1080 pixels 12 bits per channel (48 bits per pixel) 24 frames per second. (you can most likely find this resolution in your display settings). The bandwidth needed for this format is about 220 megs per second. we would need a fire wire cable capable of more then 1760 Mbit to be able to hare real-time transfer (the current fire wire is capable of 400 Mbits). The point is that HDTV is not that far away. I mean this is uncompressed!

    My computer is ready for it, and so is my display, but so far there are no consumer cams out there and renting a sony HDTV cam will cost you about 1700$ a day! even though many digital still cams can take substantially higher resolution images (i guess its the bandwidth....) it is going to be interesting to see who will be first whit a consumer HDTV cam... sony? canon?

    But the thing that i am waiting the most for is 12bits per pixel, its not HDRI (High Dynamic Range Images) [isi.edu] but its a lot better then 8 bits.

    Now let me rant a bit about resolution :
    It has taken about 15 years to get to this point (this tells you a bit about how serious tv people are about formats). And the really silly part is that a mpeg [cselt.it] image does not have a set resolution. the resolution is just a part of the header as a "please un compress this data to this resolution" statement. So since all DTV(digital tv) will be mpeg there is no need for a fix resolution. you could in fact just broadcast the data an let the receiving tv set, unpack in in any resolution! This would also make it possible for different programs to be broadcaster in different aspects. every one seams happy whit the fact that "wide screen" is 16:9, but nothing is filmed in 16:9, most films are made in 1.85:1, 2:1, 2.1:1 or even 2.34:1 so we will still get borders on our tv sets, borders that will be broadcasted and take up bandwidth! why cant i buy a TV set whit the aspect and resolution i want and then just de-code the tv signal the way I want?

    Eskil
  • Or perhaps a flashlight, a pair of sunglasses and some 'shrooms. Much more fun than holograms.
  • "I'll take 'Optic Sex Press' for for $500, Alex."
  • Must be all that rock solid open source software.
  • All the links at the site pointed to require registration. I think it is not a good idea to link to something that it not much different from a porn pusher or marketing fraud, leading to nothing but a multitude of links that require registration.
  • by Anonymous Coward
    Hi,

    If you want to see a nice application of image based rendering, take a look at Elixir Studios [elixir-studios.co.uk]. They are using a technique called relief texture mapping to produce a game with really photorealistic graphics. The screenshots on that site are absolutely awesome. Forget Quake 3 and Unreal!

    CU,

    AC
  • You laugh, but our web proxy here at work blocks the site.

    Bingo Foo

    ---

  • Integral photography dates from the 19th century. Here's an explaination. [nhk.or.jp] It trades resolution for depth inefficiently, although there's been some work in the UK on compressing integral TV images. [dmu.ac.uk]

    This group is using the technique to extract depth using a single HDTV camera. That makes sense, although the approach is somewhat low-res. Depth extraction from stereo images is commercially available [ptgrey.com], and is an alternative to this approach.

  • This is image-based rendering. In image-based rendering, there is no 3D geometry, in the sense of vertices and polygons. What you have is an image of a subject from a number of angles, and you interpolate to get the angles in between. Remember Quicktime VR? (My graphics wonk credentials include unadvisably early adoption of QTVR.) It's the first reference in this article. Regular SIGGRAPH attendees will be used to having their jaws dropped by new image-based rendering advancements every year. This is definitely a field to watch, but what's presented here looks like an incremental advance more than a breakthrough. The article is a bit unclear, but it sounds like the new wrinkle here is that you use a set of microlenses to capture all the light coming through a plane...as if you had a solid wall of cameras. Then you have an output device which can take advantage of all the information in each of those source images (actually source video in this case). The point is a flat device which will give the illusion of 3D (including parallax) without an intervening plane or glasses. If I'm reading this correctly, it sounds like the effect will be very similar to a plane holograph, where you can walk around in front of the display and look at it from different angles, but if you go to far the image breaks, and of course you can't go behind it or rotate the object. So saying you can look at it from "arbitrary angles" is a bit thick. Anyway, this is a very separate thing from those systems that take a number of photographs and reconstruct the geometry, or those other systems that use range-finding lasers or the like to actually measure the geometry. In short, no application for Quake here, so it's surprising it got funded in the first place. :) (Um...now watch them come out with image-rendered Quake...)
  • I work for a company that is currently beta testing a product which is almost ready to be released(March god willing).

    Synapix(www.synapix.com) is developing software for Windows NT/2000 called Synaflex, which creates a 3D model from a 2D camera path, then allows you to place 3D objects created in Maya and 3D Studio Max into the created scene.

    Trully revolutionary stuff. Unfortunately, this product is aimed at strictly production houses, and no it's not going to be open source. The people who plunked down the backing money wouldn't be too pleased with us giving the technology away. :)

    BTW...why wasn't the Slashdot downtime mentioned at all? Not newsworthy enough? If Microsoft goes down, we hear about it for weeks, but if Slashdot goes down...nothing?!?! Maybe I'll submit it as a story to Taco or Hemos.
  • erm, you do understand that hdtv has existed in japan (and maybe other asian countries?) for several years now.. even back to or before 1995.
    The major difference being that they use an analog system and our system (american, titled DTV) uses a proprietary digital system that is completely different from any standard in ANY other country.

    [sarcasm] Of course, since hollywood is so smart, our system will have intimate piracy protection systems built in so that those nasty pirates that steal movies off of the air (and thus degrade our capital and king, hollywood). Which incidentally will produce movies and shows that can only be seen on compatible sets, forcing all other countries to fall into stride. After all, they need our protection from those nasty pirates.. Even the casual pirates that want to watch a show more than their alloted one time. [/sarcasm]
    -since when did 'MTV' stand for Real World Television instead of MUSIC television?
  • Wow. That's really neat stuff. So, anyways...your product would let you seamlessly create a 3d world from a video feed? (At least in theory...I'd assume you'd have to build a world for the camera to match up with...) As a fake director, the possibilities for bringing together computer animation with live action would be certainly be much more interesting. What programs will this work with? Even more interesting, will it capture lighting data? And finally, how cpu/ram/disk-intensive is this?
  • On the other hand, even if you used some kind of radar/sonar, you would still have to include a camera in the package, for capturing the textures.

    --

    "I'm surfin the dead zone
  • well yes, in one way, but it isnt compleatly true. since the screen is larger then the actiual moving image some of the address space is waisted on the borders. And since mgeg writes in tiles it may have to split the tiles extra number of times to retain a sharp edge to the border.(mpeg works better on gradiant areas then on edges) Eskil
  • There's that scene in Bladerunner where Deckard seems to use a voice activated computer to extract 3D information from a 2D photograph.
  • No, a sensitive enough X-ray or sonic detection device can detect textures. Of course, the cost is well above the efficiency. Right now, I'll stick with mapping points in a 3D program...using my finger as the scanning device. I'm having too much fun with that as it is.
  • Not useless at all! You're right, this will be incredibly useful to the pornography industry, but only once it becomes mainstream and commercially available to the enduser. It may have more meaning in security fields and definitely in education (though I don't imagine its widespread use there for another decade or two...or three). I believe the 'cooler' thing would be holographic projection--3 dimensional real-time physical rendering/display...you know, Star Trek stuff! Eagerly awaiting new technologies, treker
  • Remember the Alamo?
  • I can't agree with that. while the Campanile film is stunning, it seems to require a handcrafted 3d model onto which to wrap the multiple images.

    This article seems to be a technique to recover 3d information from a large number of slightly offset pictures.

    The paper shows an illustration in which the camera lens is covered by a large number of circular/fish-eyeish lenses in a grid pattern.

    By integrating the slightly different view points of each one, the 3d information of the scene can be recovered. The Japanese researchers (and this is where I started skimming...) seem to have several heuristics for performing this transformation in realtime. The application is that HDTV sets equipped with appropriate decoding circutry could allow the user to rotate the scene very slightly at will. Of course, this comes at a significant resolution cost... almost so much so that I wonder if it wouldn't be more efficient to send 9 or so exponentially placed pictures in one (reducing resolution by 3) and using simple morphing to simulate free movement between them.

    Just a thought.
  • The answer lies within the mathematics: Transmitted light holograms work in the following manner: Using a special film set on a plane, you expose the film by illuminating an object in front of the plane with laser light (a single wavelength) and illuminating the film itself with additional laser light.

    After developing the film, if you illuminate the film with the laser light from the same angle and of the same wavelength as before, then a 3-dimensional image of the object is created within the plane.

    The cool thing about the reconstructed image is that it is 'true' 3-d. That means, as you move your head from side to side as you are looking at it, you can actually see behind the objects in the image.

    To understand how this works, I struggled through the mathematics of how the light beams pass through the developed film as well as how the original image exposes itself on the film. I don't understand it conceptually, but the mathematics involved nothing more than trig.

    My personal disclaimer: I didn't read the article that the abstract refers to (no PDF viewer), but based upon the abstract, it sounds like they are using a techniques similar to what I described.

  • Does this mean I can create an Unreal Tournament map of my house just by filming it?

    ----------------------------
  • As far as I understand it, they take a series of pictures from differnet POVs. When viewing the image, the pixels are taken from different pictures and are interpolated dependand on the POV of the observer. Think of one of those Postcards, where the image changes, when you look at it from a different direction.
    Theoretically you could put a plane of microlensis onto the (TV-) screen, transmit a composition of the pictures taken, and the lenses will "select" the right image for your POV. - But then either the resolution is very low, or the bandwith really has to improve.
  • by Anonymous Coward
    There are an awful lot of academic groups doing similar work - what I'm familiar with goes back to 1995 or earlier, but I believe work went on for years before that in other subfields. There are an awful lot of similar products showing up at SIGGRAPH and trade shows. This posting started with hype about holographs - display devices - but the actual substance was capture devices. Very, very different.
  • Actually, yes....my house really is crap

    ----------------------------

  • You might have seen it already, but a forerunner of this technique more suited to getting a 3D model of architecture is described at Paul Debevec's Home Page [isi.edu], with the famous Campanile film. It's pretty amazing what can be done - perhaps one day digitised actors can stroll around a digitised building, with various other additions made to it...

    Ford Prefect
  • I have the same problem with the PDF viewer, but I can explain the concepts behind any sort of light-based 3D capture.

    It basically depends on polarization. All light is polarized, meaning that the electric wave and the magnetic wave that make up a photon are orthogonal (at right-angles within a plane) to each other. Most light is randomly polarized... that is, it bounces around at random with no structure to it. That's why lasers are commonly used in holography; it provides a polarized constant.

    A traditional hologram is made by bouncing polarized light off an object (possibly from several different angles) and then exposing a piece of film to both the original, highly polarized light, and the light that is reflected off the object. When light is reflected, you change it's polarization to be (typically) parallel to the incident of reflection.

    This makes miniature "grooves" in the image... they're virtual grooves, meaning they have no height, but all the same they selectively reflect only light of certain polarizations. Then, by shining the same type of polarized light on the exposed image, different angles of viewing select different polarizations, meaning different angles of viewing on the target object.

    As for how this technology works, from what I can tell they're capturing the color and polarization of all the photons. This, combined with the width of the CCD, allows you to capture 3D information about the subject matter. If you were to add a source of polarized light to this thing, you could probably through the use of mirrors capture EVERY angle, just like a traditional hologram.

    As a matter of fact, it doesn't even have to be visible light. Infrared will work fine, though you'll only get a rough gray-scale. But then, you don't need to be shining red/green/blue laser light around everywhere...

    Won't the matrix people be mad at this! They spent Some Great Value Of Hard Earned Cash (SGVOHEC) to develop bullet-time, and now they can just use what turns out to be existing technology, making that expenditure of SGVOHEC a moot point.

    Oh well, maybe the superbowl people will get with it next time so that my super-zooming rotating image of the QB won't jerk around like a 10-year-old computer trying to run quake...

  • You'd need everyone to install on their client the "tripping over dirty clothes" mod, first.
  • Actually, since HDTV as you pointed out is MPEG, only the changes from frame to frame will be transmitted.

    The upshot of this is that the borders caused by improper aspect ratios will only get transmitted once, since they will presumably be black till the end of program or commercial break.

    Just my $.02

  • Hrm.

    I can see the 3d placement along the 2d camera path -- basically, you are deriving how the virtual camera should move and then blue screening that onto the real footage.

    But: are you able to deduce accurate environment maps? Occulusion of the 3d virutal elements by real world items? This would seem to require true scene intepretation, while just deducing a virtual camera path from the real world footage sounds much more doable. Do you also deal with zooming and depth of field?
  • Phil Torr at Microsoft Research did this years ago using, not HDTV cameras, but a single, ordinary camcorder.

    See an example of some raw footage of a canyon [microsoft.com].

    Then see the 3D model of the canyon he recovered from it. [microsoft.com]

  • Although this is definitely a neat technique and has a lot of promise for low-resolution use (yes, broadcast television is low res) it isn't going to live up to the needs of filmmakers.

    What is more interesting to me is the possibility of using this system to lower the cost of motion capture: even if you don't use the footage in your movie it could be very useful for generating and compositing special effects.
  • Could anybody please moderate up the parent post and/or some of the few other on-topic insightful/interesting comments instead of just rewarding jokes?
  • I can't agree with that. while the Campanile film is stunning, it seems to require a handcrafted 3d model onto which to wrap the multiple images.

    Actually, if you read his thesis [isi.edu], you'll discover that a large proportion of the modelling process was done by the computer. A number of photographs from different angles were used as the source, and lines traced on them by a human operator. The actual three-dimensional structure and proportions were then determined from those by the computer, and relevant, direction-dependent textures recovered and applied.

    Pretty interesting reading - a very different procedure to that described in the posted article, but who cares. :-)

    Ford Prefect
  • Oops... I didn't read the original post to Slashdot carefully enough. I thought the phrase:

    reconstruct the three dimensional geometry of the scene

    meant what it said.

  • I can't remember the name of the game, but a number of hears ago (>10) i can remember playing a SEGA racing game that used technololgy that would have to be along these lines
    It was a sit-in F1 style racer, with a hige monitor up front with this weird plastic lens covering the monitor, and from the driver's seat, the 3D effect was pretty damn good.
    this was around the same time (give or take a year or so) that another 3D game came out that used digitized footage of live actors in a game that was done like Dragon's Lair (view footage, hit joystick or button at the right time to progress to the next scene) however this game used a round circular table, with a spherical mirror behind it to project a "hologram" onto the playfield
    anyone else remember these two games?
    -- kai


    Verbing Weirds Language.
  • This method (and all 4D light field capture that I know of) don't use polarization information.

    You want to render a scene from a new viewpoints. If you know how all the light in a cube is moving, you can render the scene in that cube from any viewpoint by putting a virtual "camera" in the space and intersecting light rays with it (through a virtual "aperature", of course). Think of it as a parameterization problem, and you can see that this will become a big table lookup (with interpolation).

    Now, for the most general case, you'd need to parameterize the space in 5D to get the data you need for unique-view rendering (3D position, plus 2D for which direction you're looking), but you can simplify things a bit by assuming that:
    a) There's a box aroung what you're looking at.
    b) you're outside the box.

    The surface of the box is 2D, and since you're outside of it, you only need data at the surface. At each point on the surface you need 2D of parameterized information (angle and azimuth), so 4D all together.

    In practice, the parameterization most people use is "two-plane", in that you have two parallel planes, with your object between them (well, that part's optional), and you store the rays by the two sets of 2D coordinates of where they hit the two planes.

    Now, what this paper does is use Integrated Photography, which is just photography with one big lens (on the camera) and a bunch of really tiny lenses in a grid (in front of the camera). The camera, through its big lens gives one set of 2D coordinates, and each little lens creates another set of coordinates in its own very tiny space (look at figure 2 in the paper for what I'm talking about). So you end up with a sort of 2-level hierarchical paramterization.

    Each of the tiny images, one from each integrated lens, is basically a little picture of the scene from the location of that lens. So you can turn this into a two-plane parameterization: the location of the lens determines where a ray hits the first plane (the integrated lens plane), and the position within the tiny image from that lens determines the direction the ray is going, and therefore the eventual point where it strikes the second plane. You need to know exactly how your integrating lens affects light, which they get by precalibration, from my reading of the paper. (You don't need the two-plane parameterization to render images. The authors of this paper work directly from the lens image. It's just important to realize that they are equivalent in concept, if not in the details.)

    This page could be quite useful to see these concepts with images:
    http://graphics.lcs.mit.edu/~aisaksen/projects/d rl f/index.html
    BTW, the paper by Isaksen really should have been referenced in the paper this discussion is about, since it mentions integrated lenses in the 4D light field context, and even has an image of a synthetic integrated photo. I think the main contribution of this paper is probably the use of HDTV to do it in real time. That's pretty cool.

    Finally, also on the MIT graphics web page is how to improve your rendering by getting a better approximation to your geometry. See the work under publications titled "Iamge Based Visual Hulls". This is another way of tackling the 3D video problem.

  • First we're squishing cats into jars, and now dogs between panes of glass, my god people, what are we doing to our pets?
  • I dont understand how they can make a 3d model out of a 2d picture. It is hard to understand their page. I'm thinking that they catch the light rays, and can tell which light rays arrive at which time, letting it perceive depth. But how do they tell the difference from old light rays? And obviously, i dont know how you'd be able to see how something looks from behind when it doesnt show up on the on the 2d picture.
  • by Jedi Alec ( 258881 ) on Wednesday February 14, 2001 @03:00AM (#433207)
    something tells me this will bring a revolution to the kinds of RSI that can be obtained from playing Quake. I wish thee all happy hunting.
  • by OlympicSponsor ( 236309 ) on Wednesday February 14, 2001 @04:02AM (#433208)
    www.opticsexpress.com, huh? Sure, maybe you can tell the boss they are "Optics Express" but we all know it's reall "Optic Sex Press". Motto: "Where girls push themselves against your eyeballs"
    --
  • by BRock97 ( 17460 ) on Wednesday February 14, 2001 @03:16AM (#433209) Homepage
    ...pornographic videos were never the same.

    Bryan R.

This is now. Later is later.

Working...