Slashdot Log In
Speech Recognition in Silicon
Posted by
CmdrTaco
on Tue Sep 14, 2004 10:55 AM
from the spell-my-naughty-words dept.
from the spell-my-naughty-words dept.
Ben Sullivan writes "NSF-funded researchers are working to develop a silicon-based approach to speech recognition. "The goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer." Good use of $1 million?"
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading ... Please wait.

Funny... (Score:5, Interesting)
If this really is true what they're saying, and knowing how much money is invested in speech recognition research on a yearl y basis, yeah, i would definately say that this is one million dollars of great investment...
Re:Funny... (Score:4, Insightful)
THAT would be awesome!
Re:Funny... (Score:4, Interesting)
How far away are we from having a machine that could identify all of the instruments in a piece of music by "listening" to the music? I say "listening" because there need not physically be a playback-and-listen, the playback could be mathematically modeled by the computer.
Re:Funny... (Score:5, Insightful)
If this is really true what they're saying then people should put tons more money into product X!
Actually, use of speech recognition technology to index video clips for search engines _is_ both a very desirable technology, and something that can be done fairly easily (most professionally produced video, at least, takes great pains to have one speaker at a time and keep noise to a minimum). There's a fair bit of video content accessible via the web right now, and this will only increase (most new digital cameras can take video clips now - remember how quickly still pictures flooded the web when digicams first became available?).
Speech recognition technology has trouble when it's trying to sort out a noisy environment or a degraded communications channel, and has trouble holding useful open-ended conversations (as opposed to task-driven), but it's very capable in most other contexts. After all, the field has been under study for decades.
In summary, your mocking of the parent post is premature.
speech recognition and deaf/hard-of-hearing (Score:5, Insightful)
making quantum leaps in speech recognition has tremendous potential for deaf and hard-of-hearing (I am the latter)
Imagine being in a meeting (almost always a problem for hearing impaired people) and having real-time subtitles.
$1 million is a TINY price considering upwards of 20% of the nation has some hearing loss and hearing aids cost on the order of $4000 a pair.
1... million... DOLLARS!!! (Score:5, Interesting)
Let me think for a moment... Hell yeah! If we had low power speech processors, the possibilities would be endless. For one, we'd finally have a Star Trek(TM) interface for our homes!
"Computer, lights!"
"Computer, make coffee!"
"Computer, Earl Grey, hot!"
As silly as it may sound, such an interface would be far more efficient than mashing buttons.
In addition, blind people could be significantly helped by this. Many of them already use speech recognition and synthesis to assist in computer usage. Imagine if their computers could suddenly understand them a thousand times better? They could talk to their computers a bit more naturally, thus saving their vocal chords from undue stress.
Other applications (off the top of my head) are:
- Voice notes on embedded devices (store only text!)
- Helpful Kiosks that can give you directions
- A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")
Any other cool ideas out there?
Re:1... million... DOLLARS!!! (Score:5, Interesting)
Universal language translators. Imagine headphones that let you understand any known language.
Re:1... million... DOLLARS!!! (Score:4, Insightful)
- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")
Kinda jumping ahead of yourself, aren't you? There are two steps to an operation like these, speech to text, and understanding the text you get out. Speech recognition gives you the first part, but you still have to be able to pull apart the sentence and figure out what it means.
Also, the article didn't say more accurate than software, it said more efficient. You know, uses less power and stuff like that? If the applications you mention (like search via voice) were possible/usable, you could run them today on an upper-end PC no problem.
Re:1... million... DOLLARS!!! (Score:4, Informative)
In fact, converting the speech to text and then trying to analyze the text without sound-level annotations might give bad results, as tonal or emotional content would be lost. You need both simultaneously to really understand what's being said.
Text of article (Score:4, Informative)
From Carnegie Mellon University:
Carnegie Mellon engineering researchers to create speech recognition in silicon
Team to develop new silicon chip
Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security.
Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware.
''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''
The problem is power--or rather, the lack of it. It takes a very powerful desktop computer to recognize arbitrary speech. ''But we can't put a PentiumTM in my cell phone, or in a soldier's helmet, or under a rock in a desert,'' explains Rutenbar, ''the batteries wouldn't last 10 minutes.''
Thus, the goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer.
The research team is uniquely poised to deliver on this ambitious project. Carnegie Mellon researchers pioneered much of today's successful speech recognition technology. This includes the influential 'Sphinx' project, the basis for many of today's commercial speech recognizers.
''We're still not even close to having a voice interface that will let you throw away your keyboard and mouse, but this current research could help us see speech as the primary modality on cell phones and PDAs,'' said Richard Stern, a professor in electrical and computer engineering and the team's senior speech recognition expert. ''To really throw away the keyboard, we have to go to silicon.'' But enhanced conversations between people and consumer products is not the main goal. ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''
Researchers plan to unveil speech-recognition chip architecture in two to three years.
First Post (Score:5, Funny)
Carnivore on telephones (Score:5, Insightful)
accuracy (Score:5, Insightful)
100 to 1000 times more efficient worth $1M? meh. maybe.
100 to 1000 times more accurate worth $1M? definitely.
Good use of $1 million? (Score:3, Insightful)
But, of course, cue the armchair blogging fanatics without a formal science education, waxing poetic about the infinite power and glory of x86 hardware running clever open source software. Maybe we could do it in perl!
Save a few kilobytes... (Score:3, Informative)
Natural Language Interpreter (Score:5, Insightful)
Re:Natural Language Interpreter (Score:4, Insightful)
Natural language processing tasks involve parsing strings of tokens and mapping them to commands to be executed. So, from your example, "Pull up the name of employee number 12345", the natural language system must map "Pull up" to "SELECT", "the name" to "name", "of employee number 12345" to "FROM users where id = 12345". Really, it's largely a problem of context, and your example shows an excellent problem: the "of employee number 12345" to "FROM..." map requires the contextual information of where to pull this information from. Surely multiple tables of a database could have an "employee number" field in them. Do you want all of the tuples which matches, or just from a certain table? Now, in the context of looking up a bunch of other employees, maybe I know what table you've been hitting a lot, and can determine what you're asking, but without that context, I have no idea.
In fact, everyday speech has a lot more ambiguity in it than could be handled without keeping large amounts of state, be it contextual or experiencial/situational. For example, if I overhear two people in a conversation, and the first thing I hear is: "Yeah, but he's been lying all though his campaign, and I for one don't support him," I have no idea which politcal candidate might be speaking of. However, if I saw that person wearing a shirt for a political campaign last week, then I have enough context to make a reasonable guess that he's talking about that person's opponent.
Speech recognition is a "lower level" than that: it's about matching acoustic information into speech sounds and then using the speech sounds to determine the word that was said. This is a hugely complex task that has a number of unsolved problems (of which these are the 3 that I can think of off the top of my head):
1) "speech sounds" are fuzzy categories, and are not canonical targets.
2) salient "features" of phonemes are disputed, contradictory and large amounts of redundancy/conflicting info are built into the speech signal
3) idiosyncratic speaker-to-speaker differences make the phoneme categories even fuzzier and can complicate the task even for the one speech recognition system that we know works: the human brain.
At any rate, the problems that need to be solved for speech recognition are not the same problems in natural language processing. While there may be some cross over in pattern-matching, the specifics of the problem spaces make it unlikely that you will get much benefit for NLS (natural language systems) from just making the algorithms faster.
Which, in fact, is my main criticism of this article: the algorithms that we have now are piss-poor, and making them faster doesn't intrinsically make them better. Unless there's been some huge advance in the field that I'm unaware of, you'd still have to train a SRS (speech recognition system) on your idiolect, by reading some pre-selected passages to it. This model has lots of problems, most specially that it's tailored to an individual. Imagine if you had to have each person that you spoke with read some canned paragraphs to you the first time you met so that you could interact....
[sorry I don't have sources for all of this; I'm AFB, and I don't have time to dredge up info right now. But, apparently, I have time to write one long-ass entry...]
Only 1million? (Score:3, Insightful)
(I did not read the article as it is slashdotted so I am relying on the summary's statement of 1 million dollars.)
The difficulties of dialect... (Score:5, Insightful)
I don't know how possible it will be to make a program that can recognize all English users. Will someone who speaks Oxford English be recognized as well as a surfer from California? I doubt it.
History.. (Score:5, Interesting)
Initially, doing anythign beyond understanding a few words would take special hardware, but after a bit of 'training' highly acurate and fast speech to text was quite a possibility with a specially developed dsp.
Then, the pentium class cpus came about, and a p90 could just do the whole thing without the dsp.
So, now someone is developing a new dedicated piece of silicon for this.. lets see how long it takes for general purpose computers to catch up.
The issue is not that this is not usefull, but that it either has to keep developing, or offer a somewhat longer lasting price/performance ratio or much better features for a logn time to come.
Yay! Boo! Uh... Oh bugger.... (Score:5, Interesting)
From the blog: ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''
Like some slight tweaking in order to deploy massive voiceprint-recognition silicon arrays for amazingly efficient automatic realtime conversation transcription and identity determination, attached to Echelon [agitprop.org.au].
So cool... so potentially evil... head begins to hurt... tinfoil hat burning....
Pretty Ambitious, Harder than it sounds (Score:5, Interesting)
My Master's research was on implementing machine learning in hardware, specifically support vector machines.
Now, they have much more money than I did, and probably this will be a collaboration involving many graduate students, but converting complex algorithms from software to hardware is no easy task.
It is just easier to do things in software, that's why it has evolved. The modular layers of abstraction allow a Computer Scientist working in machine learning or speech recognition to not have to worry about how the underlying hardware works.
Working in hardware, a lot these issues come face to face. Particularly since you want an architecture on a chip, whereas in a conventional desktop/server system there are resources such as lots of RAM, harddrive space, etc are available and their interconnections have been built and refined over decades.
Throw in concerns about small form factor, low power consumption, quite fast a lot of unexpected hurles pop up.
My master's research goal was to produce a data mining/machine learning machine, or at the very least a data mining/machine learning co-processor. In retrospect, that was a very ambitious goal that would require many years of work, probably in collaboration with other graduate students.
What I ended up doing was just Support Vector Machines in digital hardware. Now granted, there is another aspect to my research that I'm not mentioning here, mainly that I didn't use normal floating point mathematical architectures, but a different innovative logarithmic based mathematical architecture. That in itself was a significant undertaking.
In any case, this sounds like a great project, I just wonder how much they can do in their (in an academic sense) very small time frame of 2-3 years. Even though a lot of preliminary work has probably already been done just to apply for the grant.
In any case, it is great to see something like this, something to keep in mind in case I ever go back for a Ph.D.
NSA: Imagine a beawulf cluster of these ... (Score:4, Funny)
Re:A measily $1 million? (Score:4, Funny)
Re:Good use of $1 million? (Score:5, Insightful)
I'm sure that when America and Russia were engaged in the space race there were people saying "Hey! This money could be better spent on disaster relief!". And where are we now? Only a few short decades later we have sattelites that tell us where hurricanes are going so that we can evacuate areas and people who would otherwise die surviveWe have a global reliable telecommunications satellites so that disaster relief agencies in third world countries can inform people of what supplies are required, and people who would otherwse die survive.
Without the massive investment in jet airline technology that could otherwise have been spent "saving the starving", we would not be able to travel to disaster areas within hours of an incident. And so the list goes on.
If you personally want to see more money invested in agencies that provide disaster relief, or reliable shelter or clean water then you only have to donate to the right charities, and encourage others to do the same. It doesn't take many people to donate out of their pockets to provide $1 million. You can start here [savethechildren.org].