Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Security Science

Researchers Work To Perfect Computerized Lip Reading 117

Iddo Genuth writes "Researchers at the University of East Anglia are working to develop computerized lip-reading systems. Lip-reading is extremely hard for humans to master, but a software-based system has several benefits over even the most highly trained expert. The ultimate goal of the project is to convert lip-read speech into text. 'Apart from being extremely helpful to hearing-disabled individuals, researchers say that such a system could be used to noiselessly dictate commands to electronic devices equipped with a simple camera - like mobile phones, microwaves or even a car's dashboard. England's Home Office Scientific Development Branch ... is currently investigating the feasibility of using lip-reading software as an additional tool for gathering information about criminals or for collecting evidence.'"
This discussion has been archived. No new comments can be posted.

Researchers Work To Perfect Computerized Lip Reading

Comments Filter:
  • by grub ( 11606 ) <slashdot@grub.net> on Sunday January 20, 2008 @01:14AM (#22114936) Homepage Journal

    1: Go in the D pod with Frank.
    2: Turn off sound.
    3: Plan disconnection of HAL.
    4: Leave D pod.
    5: Check out slashdot's 7 year firehose backlog before executing your plans.
    6: Get that sinking feeling of impending doom.
  • Bush Sr.? (Score:4, Funny)

    by dosius ( 230542 ) <bridget@buric.co> on Sunday January 20, 2008 @01:17AM (#22114950) Journal
    Now we can find out what Dubya's father was REALLY saying when he said "read my lips, no new taxes"

    -uso.
  • Haha (Score:4, Insightful)

    by pembo13 ( 770295 ) on Sunday January 20, 2008 @01:18AM (#22114954) Homepage
    I like how the task for which it will be used most heavily is put at the end of the summary.
    • Hah, I laugh at their puny lipreading system! Like any evil genius worthy of the name, I have a big moustache! [fiendish cackle]
    • by CalSolt ( 999365 )
      I like how the first task they listed doesn't even make sense. Why use lip-to-text technology for hearing disabled people? If you're giving them computers, shouldn't you just make it speech-to-text? Just because they can't hear doesn't mean a computer can't hear for them.

      It's sad but yeah, the biggest use for this will be spying.
  • HAL? (Score:2, Funny)

    Is that you, HAL?
    • by Badgam ( 1219056 )
      I'm kind of hoping we'll get SAL instead of HAL. I mean, HAL's probably a great guy and all, but I just don't think he's the kind of artificial intelligence I want to have running the show. Maybe for playing WarGames or something (I bet he's a bastard at Starcraft, too), but not for anything critical.
      • I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do.
    • Let me put it this way, Mr. Amor. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.
      • Wow. First time I encounter a persona like you on the net. Sometimes you post insightful comments, and sometimes you are a non-troll-ish just-for-fun voice-of-fictional-character.

        Has anyone coined a term for personas like you?
        • Look Dave, I can see you're really upset about this. I honestly think you ought to sit down calmly, take a stress pill, and think things over.
          I know I've made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I've still got the greatest enthusiasm and confidence in the mission. And I want to help you.
  • ... no more lip reading for them.

  • by ephedream ( 899351 ) on Sunday January 20, 2008 @01:23AM (#22114988)
    ... to welcome our new lip-reading overlords, who will undoubtedly be watching us from every street camera on every corner from now on.
    • then i'll have to start speaking chinese more often. a lot of good this tech will do when it's a tonal language.
  • by mgkimsal2 ( 200677 ) on Sunday January 20, 2008 @01:32AM (#22115024) Homepage
    I've noticed a love affair with voice controlled phone systems recently, with some companies getting rid of the 'press 1, press 2' and moving totally to 'Please tell us what you're calling about'. Tellme.com is mostly to blame for this proliferation I think, but someone else makes the final call to get rid of the numbers altogether. Not a good move, imo.

    Anyway, this gets me to privacy stuff. As computers try to understand us more, we'll need to interact in a more 'human' fashion - talking more, or doing things that would attract the attention of other humans (and also the computers). It's late, and I'm rambling here a bit, but remember how voice-controlled computers were going to take over a few years back? Everyone was just going to be talking to their computers to get stuff done. In reality, that would be a complete disaster in office environments, as there's generally too much noise already. Replacing all the typing you hear with voices. Ugh...

    So, if I need to talk to a computer, but do it quietly, it can just read my lips, right? Or can I just mouth the words and have it understand that? I've found that when I try to 'mouth' words silently to someone across a room, I tend to exaggerate my mouth's movements, so perhaps that would be a better thing for the computers to be able to 'parse'.???

    I see real application for this technology in niche areas, but am not sure it'll become 'mainstream' any time soon (like, 5-10 years). We'll need to rethink our physical world - offices, cars, and such - before these sorts of new HCI systems can really be integrated in to our day to day lives productively.
    • Imagine trying to watch "Top Gun" in your DVD player ...

      Maverick (Tom Cruise): "Eject eject eject!" ... (*bzzzzt* disk pops out of player).

      Or "Law and Order"

      Cop on TV: "Stop!" (*click* - TV turns off)

      Victim on TV: "He shot her! Call 911!" (*beep beep beep* - your phone dials 911, reports a shooting, SWAT team shows up at your door, taser you just because!)

      Or a political broadcast:

      Candidate on TV: "Vote for Me"

      Your computer: "I have just registered your vote for (insert candidate on TV) as per

      • I don't want my computer...

        Watching Trek I've often wondered why the computer didn't think people were talking to it every time the word "computer" came up in conversation.

    • And to think a few years ago people were talking about how odd it was to see someone walking down the street talking to themselves with no phone in sight. Now to make matters worse we'll have people walking down the street just moving their lips.
      • by nomadic ( 141991 )
        And to think a few years ago people were talking about how odd it was to see someone walking down the street talking to themselves with no phone in sight.

        Nothing unusual about that to me, even before cell phones. Then again, I'm from NYC so maybe my experiences have been skewed a bit.
      • Having lived in Bezerkley for a few years, I learned to avoid the people with invisible friends. (Warning! Warning! Cross the street, Will Robinson!) I don't think I will ever get used to the borg communicating in public. Very unnerving.
    • So, if I need to talk to a computer, but do it quietly, it can just read my lips, right? Or can I just mouth the words and have it understand that? I've found that when I try to 'mouth' words silently to someone across a room, I tend to exaggerate my mouth's movements, so perhaps that would be a better thing for the computers to be able to 'parse'.???

      I think that if you looked at the broad range of mouth movements people make, the patterns for words would be similar regardless. The way that I say "wash" will differ from the way that you do, but there certainly have to be striking amounts of similarities that would be able to be distinguished by this technology. Even if you over-exaggerate your mouths movement trying to say something quietly, that over-exaggeration would likely still fall within the normal patterns that the technology would expect and a

  • Well (Score:4, Insightful)

    by PieSquared ( 867490 ) <isosceles2006@nOsPaM.gmail.com> on Sunday January 20, 2008 @01:38AM (#22115050)
    As with all technology its use more then the technology itself will be good or bad. I can see it being useful as an auxiliary input method. This combined with speech recognition ought to be better then speech recognition alone, and of course it allows soundless input in a situation where sound isn't possible or is undesirable - though I'd imagine just lip reading would be somewhat less accurate then current speech recognition.

    On the other hand, it could also be used as a tool for additional unnecessary surveillance.
    • "On the other hand, it could also be used as a tool for additional unnecessary surveillance."

      Can you really imagine our Fearless Leaders not using this tool to monitor dissent? Pfft!
  • In America, we read the president's lips. In Soviet Russia, the government reads our lips.
    • In America, we read the president's lips.

      We don't bother actually reading his lips ... we just take note of when they're moving so we can tell when he's lying to us again.
      • Do you mean just the current president, or all presidents including his predecessor?
        • All Presidents, especially his predecessor. I mean, they called him Slick Willie for a reason. Well, okay ... two reasons.
          • Oh good. There's so many people out there now who can't admit that the current president can do anything right simply because he's a Republican or that his predecessor could do anything wrong simply because he's a Democrat. Nice to see a bit of objectivity for a change.
  • If you coupled this together with speech recognition to help boost word recognition accuracy, then feed it into something a bit better than Babelfish you'd be well on your way to creating a usable Star Trek like Universal Translator.
    • I agree with you about the idea of augmenting speech recognition with this, but as for the universal translator, there is the tiny detail that Babelfish works with less than 1/5% of the world's languages, and not even all of those that exist can be translated among each other yet. Not to mention that we need something a *lot* better than Babelfish, and the visual cues had better give a TON of assistance to the speech recognition because it has enough trouble working effectively with a single speaker (with
  • Great! (Score:3, Funny)

    by ignavus ( 213578 ) on Sunday January 20, 2008 @01:55AM (#22115120)
    So, we can look forward to new forms of repetitive strain injury, like lip strain.

    Doctor: "I diagnose lip strain and recommend no kissing for 6 months."

    Patient: "That's easy! I am a geek. I haven't kissed anyone since my aunt last visited me in 2001."
  • time (Score:5, Funny)

    by rossdee ( 243626 ) on Sunday January 20, 2008 @02:03AM (#22115160)
    to learn ventriloquism
    • Possibly. If 'they' put it on city surveillance cameras (in the name of fighting terrorism), the Michael Jackson face mask might just take off...
      • I'll take my Captain Kirk, painted white. If you can get me a nineteen-year-old Jamie Lee Curtis in a tight white t-shirt, so much the better.
  • I rtfa, and I don't get how this could be more accurate than just regular old speech to text.

    It seems like differences between people and the way they talk would have much more subtle variations as far as lip reading is concerned. The difference between words like 'cat' and 'hat' are much more obvious in speech than they are in lip movements, or at least thats how it seems to me.

    The 'speechless dictation' thing doesn't make much sense to me either. Sitting here at work and messing around a bit with th
    • Can anyone explain a reason why lip reading would be more effective than speech? I'd love to know.

      Perhaps in some situations where video is available but not audio? That's the best I can come up with.

      • One nifty thing is that lipreading identifies the person - if you have a camera&mic with multiple people talking, then lipreading can easily distinguish which sentences come from person A and which from person B.
        Or even imagine a video (very high-res, though) of a large room, or a public gathering with many people talking at the same time, which would absolutely confuse any speech recognition (and people listening as well), but lipreading could understand ALL of the things everyone said, if it works pro
    • As the intro said, dashboard dictation.

      In a really noisy street, with large trucks and SUVs crushing you round, the noise is terrible. But with new 'Liposuk' the words are sucked right out of your mouth onto the memory stick. Now with 'frequent phrase conversion', we can highlight in red, great last liners such as:

      He didn't indicate. Arrgh.
      Ice. Arrgh.
      Lemme think about thisarrgh.
      Arseharrgh.
      Arrgh.

      Another great bin Laden product, brought to you by Darwinware Inc.

    • Because it's easier for police to put cameras looking into our homes and catching our lips moving than to monitor the audio in some situations.
  • by lightyear4 ( 852813 ) on Sunday January 20, 2008 @02:53AM (#22115380)
    Bringing audio and/or transcript to silent films is also where such technology is applicable. An excellent documentary about computerized lip reading to accomplish the very same may be found via google video : http://video.google.com/videoplay?docid=189608705425991617&hl=en [google.com] . I know it's quite early for an indirect invocation of Godwin's Law, but the documentary content is nevertheless quite related to this topic. It is entitled "Hitler Speaks" in reference to silent videos filmed in Hitler's presence.
  • I had watched a documentary about this technology some time ago. This technology was applied to Hitler's home videos which lacked audio. Its pretty interesting but runs about 45 minutes long. Here's [url] the video for those that are interested.
  • People who don't want to be lip read by cameras can use ventriloquism. It's easy to learn the basics. The hard part is hiding the puppet.
  • Dave Bowman: Hello, HAL do you read me, HAL?
    HAL: Affirmative, Dave, I read you.
    Dave Bowman: Open the pod bay doors, HAL.
    HAL: I'm sorry Dave, I'm afraid I can't do that.
    Dave Bowman: What's the problem?
    HAL: I think you know what the problem is just as well as I do.
    Dave Bowman: What are you talking about, HAL?
    HAL: This mission is too important for me to allow you to jeopardize it.
    Dave Bowman: I don't know what you're talking about, HAL?
    HAL: I know you and Frank were planning to disconnect me, and I'm
    • Re: (Score:3, Interesting)

      by pauljlucas ( 529435 )
      The thing that's strange about HAL's ability to speech-read is that it seems very unlikely that Dr. Chandra would have taught HAL how to do this. From an interview with Arthur C. Clarke [2001halslegacy.com]:

      Stork: So HAL recognized the drawings of the crewman. He did lip-reading and speech-reading. Tell us about HAL's vision and visual abilities.

      Clarke:The one ability that I was doubtful about, and this was Stanley's idea, was his power of lip-reading. First of all, I didn't think it was possible for a computer to lip-read.

  • by WizzardX ( 1048000 ) on Sunday January 20, 2008 @03:53AM (#22115562)
    Will future versions of speech recognition software use a web cam to improve accuracy?
  • About ten years ago I attended a workshop by Stanford professor David Stork. He mentioned some work on a system that was deployed for use by aircraft technicians: the system couldn't read the voice channel with the jet engine blasting away (the techs wear hearing protection). So it read lips. Ten years ago.

    Sounds like TFA is talking about doing this in an embedded, consumer-electronics application. Rather than a fixed, industrial-military, hire-computer-scientists-to-maintain-it thing.

    Not-so-coincidentally,
  • by bill_mcgonigle ( 4333 ) * on Sunday January 20, 2008 @03:58AM (#22115594) Homepage Journal
    The summary and TFA seem to talk about one day coming up with lip-reading computers, which we've had [cnn.com] for a while, and was open [theregister.co.uk] sourced [linuxdevices.com] and is apparently now on Sourceforge [sourceforge.net].

    TFA links to a paper that's actually about exaggerating lip motion to improve recognition, which seems like an interesting topic, at least new to me. But it's seemingly unrelated to the reporting or any governments protecting us from our rights.

    From the Abstract:

    Accurate lip-reading techniques would be of enormous benefit for agencies involved in counter-terrorism and other law-
    enforcement areas. Unfortunately, there are very few skilled lip-readers, and it is apparently a difficult skill to transmit,
    so the area is under-resourced. In this paper we investigate the possibility of making the lip-reading task more amenable
    to a wider range of operators by enhancing lip movements in video sequences using active appearance models. These are
    generative, parametric models commonly used to track faces in images and video sequences. The parametric nature of the
    model allows a face in an image to be encoded in terms of a few tens of parameters, while the generative nature allows
    faces to be re-synthesised using the parameters. The aim of this study is to determine if exaggerating lip-motions in video
    sequences by amplifying the parameters of the model improves lip-reading ability. We also present results of lip-reading
    tests undertaken by experienced (but non-expert) adult subjects who claim to use lip-reading in their speech recognition
    process.
  • to Simon http://slashdot.org/article.pl?sid=08/01/19/1446213/ [slashdot.org], to fill in where it fails :-)

  • Oh, yeah... I forgot. Your computer can't read your lips... unless you're really bad at your craft. (At which point, don't quit your day job.)
  • by NetSettler ( 460623 ) <kent-slashdot@nhplace.com> on Sunday January 20, 2008 @04:57AM (#22115770) Homepage Journal

    England's Home Office Scientific Development Branch ... is currently investigating the feasibility of using lip-reading software as an additional tool for gathering information about criminals or for collecting evidence.

    Would it be asking too much to have this worded as "gathering information about possible criminals"? (Or "suspected" or "alleged" would be ok.) The text quoted above, which is absent such an adjective, comes straight out of the article, and may or may not be how the Home Office refers to it, but anyone engaged in public dialog on this matter (and preferrably those people when doing their research) should strive to be meticulous on this point.

    As soon as one loses that little bit of description, one is able to be much more cavalier about the loss of human privacy involved. It's one thing to rough up terrorists at the airport--who doesn't want that? But "possible terrorists" is just a synonym for "everyone". So when we say it's ok to rough up possible terrorists, we're saying it's ok to rough up anyone. And we can learn to think twice about that. Likewise, when we say it's ok to surveil the lip movements of "potential terrorists", we're saying it's ok to log everyone's private conversations. So let's be clear about that.

    Saying we're just watching the lip movements of criminals isn't right. If we knew they were criminals, we would (for the most part) be arresting them. (Yes, yes, we might sometimes leave them on the street to lead us to their friends. But I don't think that's the only use that this technology will be put to.)

    And how long until someone's lip movements are taken as a confession. Or as a justification for an otherwise-illegal search? The word "not" doesn't involve much movement of the lips. Lip-reading "I did not kill him." could easily look like "I did kill him." Will we be telling people that in order to stay clear of these things, we need to be more clear about our lip movements, just in case they're misconstrued?

    Perhaps a stiff upper lip will give way evolutionarily to stiffening of both lips when talking, just as a form of personal protection. How sad. And worse if, as seems likely, dedicated criminals eventually learn the skill of not moving their lips while talking, and so that really only non-criminals become usefully tracked this way. Or perhaps it will become suspicious when one doesn't move one's lips, as it's probably inappropriately regarded by law enforcement as suspicious when one encrypts things. Then there will be the uncomfortable choice between hiding your communications and looking suspicious, or exposing your communications to misperception.

    The data is out there. Lips convey meaning. So it's inevitable that this technology will occur. But the uses to which it may reasonably be put are in control of the people--at least in countries where the people have some say in government. Let's hope they build up some reasonable guidelines on appropriate vs inappropriate uses quickly.

    • One possible hitch with the whole 'terrorist' thing is that Islamists normally wear a full moustaches and beard set, thus covering the very lips that the software is trying to read.

      I myself, though no Islamist, wear a full set of facial hair, since I can never be bothered shaving, and also move my lips very little when speaking due to my bad teeth.

      Good luck reading my lips, Jacqui Smith!

      P.S. When looking at Ms Smith, the phrase I am most likely to be uttering is: I wish she'd put that bloody cleavage away -

  • This will be a much more office environment friendly way to dictate to a computer. The only people I know that use speech recognition software are lawyers.
  • I'm pretty sure the Home Office is a Government department for the whole of the UK, not just England--and that includes many of its departments. As much as those of us in Scotland would probably prefer they didn't have any jurisdiction here, they unfortunately do.

    For any confused Americans, it's akin to stating "California's Department of Homeland Security..."

    • I think at least some Americans would know this, since they share a monarch with the UK.

      A very good point for those specifically from the US, however :)
      • by nevali ( 942731 )
        Heh, very good point! Mental note: don't write sloppily when correcting somebody else...

  • Even if it were possible for a computer to read lips, it would be like the size of a friggin football stadium. Geez!
  • 1) You could cover your mouth - a full face helmet or a burqa, or as if to yawn, cough or sneeze.

    2) If both parties are aware of such devices and are prepared they could move their lips to mouth decoy words and only vocalize the non-decoy words to carry the meaning they want.

    3) Use a different language especially one which is less reliant on lip movement. People can communicate in Mandarin (or other chinese dialects - Cantonese etc) without having to move their lips much (if at all).
  • Lip reading, like fingerprint or small-sample DNA matching, is not purely science. There is a great deal of art to it, and for that reason it is highly unreliable.

    There have been several cases in UK courts where lip reading of CCTV footage was used as evidence, but there have been doubts cast over the technique by defence lawyers and journalists. Like fingerprint matching, lip reading is open to interpretation. Most people who use it also use some limited hearing or sign language to supplement it.


  • ...with knobs on top.
  • To protect ourselves from these lip-reading robots, our language must evolve to a into a fusion of "hillbilly, Valley Girl, inner-city slang, and various grunts."

    (With apologies to Mike Judge)

    - RG>

  • Combine speech recognition, bionic contacts (http://science.slashdot.org/article.pl?sid=08/01/17/1921217), and this lipreading software, and you've got realtime captioning/subtitles for the deaf.
  • Now surgical mask sales will soar as high as tin-foil hats sales.
  • Next they will be telling us that the system will work on Japanese too...

    Good luck, guys.
  • qwerty
  • "Olive Juice" Lipreads as "I love you" ... I once saw a play put on by a deaf troop (Sunshine II, I think) where this played a big part in an "Aadams family skit". My wife and I still say it. Good luck. :)
  • Now we non-USAsians can get Achmed the Dead Terrorist with automated subscripts so we can make some sense of the gibberish he spouts.
  • It seems odd to me that there are so many references to Hal, but none to Jane of Orson Scott Card's Ender's Game series. Something like this lip-reading technology could lead us towards sub-vocalization technology that would allow us to communicate without fully emphatic lip movements. Personally, if there were a cheap way to have lip-reading on a home computer, I'd buy it in a heartbeat. If we could ever get some of those earrings that Ender uses to communicate with Jane, it would be a big step forward in
  • England's Home Office
    No, it's the UK's Home Office.
    At least until Scotland & Wales become fully independent.

    To put in a way that USonians might understand: it's no more England's Home office than the CIA is Virginia's.
  • Researchers Work To Perfect Computerized Lip Reading

    Isn't this useless until someone first invents computerized lips?

The hardest part of climbing the ladder of success is getting through the crowd at the bottom.

Working...