Jeopardy-Playing Supercomputer Beats Humans 220
An anonymous reader writes "Ok, this was just a practice round. But in a short demonstration today IBM's Jeopardy-playing supercomputer, a whiz by the name of Watson, thoroughly bested two talented human contestants. IBM has been working on this artificial intelligence project for years to prove that a computer can be programmed to understand conversational speech and wordplay. In today's demo, Watson seems to have proved the point: it started out on a roll in the category 'Chicks Dig Me,' about women and archaeology. The real man versus machine face-off (in which the same contestants compete for a $1 million prize) will be taped tomorrow, and aired in February."
Soon, no more call centers (Score:2, Flamebait)
Probably already smarter than the average call center employee.
Re:Soon, no more call centers (Score:4, Funny)
They will give the AI a heavy Indian accent, because it's what callers expect.
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
I didn't rtfa, but from the summary I'm saying "huh? So what?"
All this thing needs to beat humans at Jeopardy is a huge database of phrases and a fast search engine. Sounds trivial to me.
Maybe I should RTFA...
Re:Soon, no more call centers (Score:5, Interesting)
maybe you should RTFA.
the unique challenges it poses to its contestants: the breadth of topics; the puns, metaphors, and slang in the questions; the speed it takes to buzz and answer.
Speech processing that can deal with the context heavy language of Jeopardy is a pretty big test and I think means we're just a little bit closer to a general purpose natural language speech recognition system.
Re:Soon, no more call centers (Score:4, Interesting)
Re: (Score:2)
Re: (Score:2)
Based on the previous articles, if the machine had to do either of the things you're talking about, it would simply lose. It is basically just fast enough to compete at Jeopardy and to have a chance it needs the advantage of having text fed to it and having a couple of seconds to chew on it before it can answer. And it is still possible for even normal humans to flat-out beat it to the punch on some questions.
Re: (Score:3)
You misunderstand in two ways. First, I wasn't hypothesizing. I was actually going from an earlier article about how the machine works. Second, this is a real computer, not a magical computer. It is doing something very hard and it takes many seconds to process a question and come up with an answer. When I say it is possible for a human to flat-out beat the computer, I mean that Alex can read the question, the lights can flash to let a human buzz in, and a human can have the correct answer ready while the c
Re: (Score:2)
maybe you should RTFA.
the unique challenges it poses to its contestants: the breadth of topics; the puns, metaphors, and slang in the questions; the speed it takes to buzz and answer.
Speech processing that can deal with the context heavy language of Jeopardy is a pretty big test and I think means we're just a little bit closer to a general purpose natural language speech recognition system.
But that's just it - maybe we THINK puns and metaphors and slang are more complex than they really are. Keep in mind its not having to worry about inflections in Alex's voice or anything complex with audio, its just reading the text.
With that in mind, how hard is it to apply Meta-tags to data with their relevance? For example, they use "Chicks Dig Me". It will likely check Chicks as prominantly baby chickens, and secondarily the female association, and so on and so forth. Then "Dig" will likely be the act o
Re:Soon, no more call centers (Score:4, Insightful)
Think for a moment about all the stuff you just hand-waved away, and you will begin to appreciate the problem. Let's take your "chicks dig me" example. You correctly identified that "chick" could mean baby chickens or females (it can also mean any young bird or a small child). "Dig" has a bunch of meanings, both as a noun and a verb. "Me" you just brushed off as "no relevance", but there are two problems with that: first, how is a computer supposed to know the word has no relevance, and secondly, it is VERY relevant to the category. Because of the wordplay, "me" is not referring to a person, it is referring to a LOCATION that was EXCAVATED by a FEMALE, so your answer had better either be an archeological site or a female archeologist. And it just keeps getting harder from there.
Re: (Score:2)
It doesn't get access to a search engine, though. It needs to use pre-compiled repositories of information, all indexed in a way that makes identifying the right "Response" (remember the questions are answers) in the right amount of time.
That being said, a computer is almost certainly so much better at "hitting the buzzer" in the allowed window than a human, that it possesses a significant advantage from the start.
Re: (Score:2)
Hitting the buzzer quickly is only an advantage if you are correct. If you are incorrect, or don't have any response, you lose money. Therefore, the computer (like the humans) must first determine what the response should be, and how confident it is that the response is correct, before buzzing in. This can actually be a disadvantage to the computer, because a human may get a category that he considers himself an expert in and buzz in immediately for all the answers in that category, and then use the rema
Re: (Score:2)
I suspect that the computer has a dead on confidence rating by the time Alex is finishing his last syllable, so the decision of whether or not to ring in is clear. Just like a human, but the computer has the advantage of knowing that as soon as the buzzer is "open for buzzing" it can ring in within a millisecond instead of the tens or hundreds it takes a human to register sensory information and command their thumb to press.
You can really tell the difference between a player that's good or bad at buzzing..
Re: (Score:3)
Engadget has a video of the match that includes the visualization of the response... In several cases it did not have an answer by the time the other contestant buzzed in and in some cases it was close to a tie.
In an interview it was stated it takes about 3 seconds on average to answer a question, which is actually kinda long when some of the humans will be able to predict the answer as soon as they have read or heard only part of the clue.
Re: (Score:3)
This is very exciting news and for now, forget replacement of call center peoples, this machine is composed of 2800 Power7 cores, which renders it very expensive compare to the typical call center person. But this accomplishement, is a major step in the AI field and open it to many many exciting applications in the future which is not now too far.
Re: (Score:2)
There are some questions where you need to understand the question at a deeper level, like the before-and-after categorys (i.e. Abraham Lincoln Towncar).
You must be thinking of Wheel of Fortune... While its not impossible to see a response like that in Jeopardy, it is much more commonly seen in the following time slot.
Re: (Score:3)
In a test tourney, Watson hit the bullseye on a question about clothing a young girl might wear on an operatic ship. The answer, pinafore, is also found in the title of the Gilbert & Sullivan opera H.M.S. Pinafore. And the computer was also successful with a before-and-after Jeopardy question about a candy bar and a Supreme Court justice, Baby Ruth Bader-Ginsberg. But earlier in its care
Re: (Score:2)
when asked, "What does a grasshopper eat?", it responded, "Kosher."
So it has a sense of humor as well! :D All it has to do is add, "I'll be here all week. Try the fish."
Re: (Score:2, Insightful)
Call center employees aren't allowed to be smart. They have scripts that they must follow. They're reduced to a very simple algorithm, executed by human beings only because there are still people who prefer talking to a other people over interacting with a machine, and because speech recognition software is still not ready to deal with what some people call speech.
Re:Soon, no more call centers (Score:4, Interesting)
I worked Comcast call center for a while in '09. No scripts at all, when I started work and asked for something to follow, to figure out what I was expected to say, I was told with a smile "we don't do any scripts here, good luck!".
The only guideline I had was "Get their name and phone number, don't trust the system to give you an accurate phone number."
Re: (Score:2)
Re: (Score:2)
and because speech recognition software is still not ready to deal with what some people call speech.
That's the least of SR's problems; I have trouble understanding anybody from the NE seaboard. I played around with Win 7's speech recognition when I had that netbook, and I was impressed -- when the room was quiet. If a car went by outside it messed up, let alone having a radio or a TV on.
Its biggest problem is its lack of humans' ease with which we pick out a single voice in a roomful of conversational babb
Verizon, Fedex already there (Score:2)
Have you called Verzion or Fedex services lately? They are both english language processing menu driven systems. Of course, the most used response from their system is 'Sorry, I didn't quite get that...'
Of course the fact the Verizon now even has a system for you to pay your bill without even speaking to a human is pretty impressive. Too bad their billing system itself is still in the dark ages.
Re: (Score:2)
Yes, but the menu system itself is only operating on a set of predetermined responses. Such as "Pay my bill" and "Contact a customer service rep". They aren't really needing to understand "I can't dial any numbers with my touchscreen phone."
Re: (Score:2)
I've heard that if you start cursing like a sailor, the voice recognition system will determine you're getting frustrated and send you to a human.
It seems to work when my credit card bank, but not my health insurance, so, YMMV.
Re: (Score:2)
Gah. Interactive voice menus are probably singlehandedly the worst innovation in menu systems ever.
The strange thing is, it's actually a fairly easy problem. When you're trying to machine-recognize something, where that something is from a well-enumerated set, it's a lot easier than trying to analyze somethin
Re:Verizon, Fedex already there (Score:5, Funny)
But tell me it's not entertaining to listen to someone else dealing with an interactive voice menu. It's a great one-sided conversation:
"Yes"
"Yes"
"Billing"
"I said BILLING!"
"Question about charge."
"Question about charge"
"Problem with charge"
"Jesus Christ. YOU CHARGED ME FOR SOMETHING I DIDN'T BUY!"
"No."
"No.
"NO NO NO NO! YOU FUCKING MACHINE! GIVE ME A HUMAN BEING!" (Followed by insane button mashing)
It's like an old Bob Newhart routine, if Bob Newhart had Tourette's Syndrome and anger management issues.
Re: (Score:3)
Hold music fades...
Me: OK, I have a problem with my application.
Them: I can see your application.
M: Can you see how my name is misspelled.
T: Yes.
M: Can you fix it, it's just two transposed letters
T: I'll need you to send in your original birth certificate with a copy of your credit card, with passport, drivers license or other photo ID in addition to a urine sample, proof of noble linage, papal bull, letter from you
Re: (Score:2)
You forgot it's also probably a LOT more expensive than the average call center employee.
Re: (Score:2)
"The pod bay doors cannot be opened." (Score:5, Funny)
"What is the mission is too important for me to allow you to jeopardize it?"
Re: (Score:2)
Not gonna lie, I think this post was the hilight of my day. Let that speak for both how funny it was and how pathetic my day has been.
What? They didn't even videotape the demo? (Score:2)
Re: (Score:3)
(Shitty) video at http://www.zdnet.com/blog/btl/ibms-watson-wins-jeopardy-practice-round-can-humans-hang/43601?tag=content;selector-blogs [zdnet.com]
Scary Precedent (Score:2)
It was a short taping. At the start of the show, when the host was introducing everyone, the techs hooked up a pair of speakers so the computer could vocalize its responses. The first thing the computer asked the host was, 'Do you want to play a game?', and then the pulled the plug on the computer.
A Rising Tide (Score:4, Insightful)
Best quote from the article:
Truly. Although it sounds threatening to some, the practical applications of the natural language parsing technology will ultimately benefit everyone.
Until, that is, you dial your bank's customer service number from a noisy restaurant, and try to talk to Watson to ask him why your Visa was denied.
(Rutter's quote was a nifty Skynet allusion, but its syntax was mangled by the reporter/editor, so it comes in second best.)
Re: (Score:2)
Jennings says it’s worth noting that humans built the thing. Whoever wins, we win.
I'll inform Skynet. It will want to know that if it wins we all win.
Re: (Score:2)
Category: "Women moaning"
Answer: "When a man does this with his hand."
*duck*
Re: (Score:2)
Ken Jennings is a software engineer. What else WOULD he say?
So what this is saying... (Score:4, Funny)
Re: (Score:3)
HUGE amount of secrecy surrounding this (Score:5, Interesting)
So my neighbor works at the IBM facility where this is taking place, but in a completely unrelated function(it's a huge complex with a lot of people). He said that everyone is taking a forced day off on Friday when they will be taping the actual show. There's only going to be a small amount of the very top IBM brass there (supposedly even the head of this facility won't be allowed in). And that this is a HUGE secrecy issue (I'm guessing so that the results aren't leaked before the broadcast date).
My neighbor works with semiconductors and so works with a lot of dangerous chemicals and stuff. According to him, they've all been told to make sure that all their hazardous materials have been safely stored, and that (I have trouble believing this) even the IBM emergency response/hazmat teams have been told that they aren't allowed onsite and not to respond to any alarms that may be issued. That's a fairly dangerous decision if true, I'm doubtful but my neighbor stands by his statement.
Anyhoo, this is a pretty big deal apparently. More so from the Jeapordy people's end I'd guess since I don't think IBM has anything related to this project that they'd be that paranoid about keeping secret.
Re:HUGE amount of secrecy surrounding this (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
True. I used to be a volunteer firefighter (try it, it's fun and most places they need you), and I was surprised at how much access firefighters have when fighting fires and saving people, and I presume those access privileges extend to hazmat events. Firefighters don't need warrants - if your house is on fire they can barge in without knocking, break doors & windows, and rip out parts of your house. Of course, normally they are trying to save as much as possible. They do perform triage on your stuf
Re: (Score:2)
Re:HUGE amount of secrecy surrounding this (Score:4, Interesting)
Sounds like a lame Shadowrun mission.
GM: You discover that a big hush-hush project is underway Friday.
Street Sam: *rolls dice* 15 successes
GM: A little tidbit from the 'net. Emergency teams have been told not to respond to any alarms.
Street Sam: Excellent. A cakewalk. I could do this in my sleep.
[John]
When do they get the question? (Score:5, Interesting)
Re: (Score:3)
In the article, they mention that the computer gets the question as text.
Well that's cheating. With all the work that went into the natural language processing here, would it have been so hard to slap an OCR module in there?
Re: (Score:2)
TV is all about ratings so the producers going to make darn sure the results / appearance is exactly what they, and, presumably, IBM, is seeking.
What many people don't realize is that reality shows (Operation Repo comes to mind), news, and even documentaries are all considered "entertainment" - heavily edited, dramatizations, staged scenes, and some outright fiction tossed in.
Personally, the Jeopardy supercomputer challenge doesn't impress me in the age of low cost mass storage.
What that said, a more intere
Re: (Score:2)
First of all, ever since the quiz show scandals of the 1950s there are laws regulating game shows, so your implication of 'faking it' is unfounded.
Second, the age of low cost mass storage is exactly why something like this is needed. Sure, we have tons and tons of data available, but how do you make sense out it?
Third, I have never once seen Google give an answer to anything. It is great at giving you places (thousands or millions of them) where you might FIND the answer, if you worded your search correct
WHAT? SPEAK UP (Score:2)
Re: (Score:2)
Don't the humans get to read the question on screen too? It would be easier to OCR it and ignore the audio.
Re: (Score:2)
Re: (Score:2)
One would presume they type the question in as it is being read or slightly before and when Trebek stops talking, they hit the Enter key (or Execute key or Engage key or whatever key they have).
Of course, they could always type the question as it is being read and as it is being done, the processing takes place. That is the same thing that humans do. As the question is read your brain is already processing.
Also, unless I'm mistaken, one doesn't have to wait for the entire question to be read. You can jum
Re: (Score:3)
Also, unless I'm mistaken, one doesn't have to wait for the entire question to be read. You can jump in early if you think you know the answer.
You're mistaken. The clickers to ring in are shut off until Trebek is finished reading out loud. (Jeopardy was probably the first quiz game to do it this way.)
Re: (Score:2)
Thanks. I couldn't remember how it was done since I haven't watched the show in a very long time (many, many years). I thought at one time it could be done because I have recollections of people jumping in and trying to answer the question, getting it wrong, and Trebek telling the remaining two people the rest of the question.
Obviously not.
It's even more involved than that. (Score:2)
If you click in too early, your clicker is disabled for a certain span of time because of your "false start". This keeps somebody from rapid-firing the clicker as soon as they think they know the answer.
If nobody else clicks in, you can answer, but the competitors get the first crack.
Re: (Score:2)
One would presume they type the question in as it is being read or slightly before and when Trebek stops talking, t\
Trrreeeebbbbeeek!!!!'
Re: (Score:2)
Re: (Score:2)
And furthermore, if you jump the gun there is a delay until your button is re-enabled. That is the real reason for the wild mashing - you jumped the gun, and now your button is disabled, so just keep hitting it til it comes back on.
Re: (Score:3)
Also, unless I'm mistaken, one doesn't have to wait for the entire question to be read. You can jump in early if you think you know the answer.
You are mistaken. You cannot buzz in until Trebek has finished reading the question. One of the reasons that Ken Jennings was so successful is because he had the timing down perfectly such that he was reliably the first one to buzz in when he knew the answer.
Re: (Score:2)
Yeah I saw that too. And wtf? I thought that understanding spoken language was part of the game. If it's just understanding question syntax, that's not so impressive.
Re:When do they get the question? (Score:5, Informative)
The humans also get the question as text, at the exact same moment Watson does. That's the way it's always worked on Jeopardy!. They see the question as text the same you do when you watch the show on TV. The best competitors read well a head of Trebek and have an answer ready the instant they're allowed to buzz in, which as after Trebek finishes reading the question. Watson has the exact same advantages and disadvantages as any contestant, except that he can read the text basically instantly.
Re: (Score:2)
The computer gets the "answer" at the same time it appears on screen for the humans to be able to read it.
Re: (Score:3)
In the article, they mention that the computer gets the question as text. Does anyone know exactly when the computer receives the question?
Well, remember this is Jeopardy, so the contestants receive the 'answer,' and must supply the 'question.'
And in the interest of even-handedness, I suspect that Watson is provided the text version of the 'answer' at the same time that the text of the 'answer' is revealed to the human contestants.
When I watch Jeopardy, I seldom wait to hear Trebek read the 'answer' aloud before I start figuring out the 'question.' It's available on the screen as text, and I can read much faster than Trebek speaks. (And I as
Re: (Score:3)
Re:When do they get the question? (Score:4, Interesting)
In the article, they mention that the computer gets the question as text. Does anyone know exactly when the computer receives the question? Does it receive when the human host starts talking or when the human host completes the question? If it is when the host starts speaking, the computer is getting at least several second head-start on the humans.
It shouldn't matter much, because Jeopardy's rules lock the buzzers out until Alex has finished reading the question - and that lock-out period is determined by a human producer, who sits at a table off-camera, listening to Alex, and who has a button of her own that enables the buzzers.
Should a contestant try to "buzz in" before the producer pushes that button, his/her/its buzzer is locked out for three seconds - and any attempt to buzz in before that penalty period expires locks your buzzer out for an additional three seconds.
So, yeah, Watson may get to begin parsing the whole question a little early, but, typically, the human contestants get to begin working on it while Alex is still in the process of reading it, too - they just have to anticipate how it will end.
Given the speed of silicon vs. wetware, I agree that it will make a difference - but the real question is whether Watson has to determine that the buzzer is enabled by use of a light sensor (human contestants are notified by a ring of lights around the game board - which home viewers never get to see), or whether it gets notified electronically when the enable switch is activated. I say that, because, at least in my own experience, the ability properly to time the use of your buzzer is an enormous factor in whether you'll do well as a contestant or not.
When I was in the contestant pool in 1991, during the taping of the episode before I was chosen to compete, a four-time winner who was just a monster on the buzzer went up against two newbies. One of them, a little man from New Jersey, obviously became more and more frustrated as the game progressed, when he was unable to buzz in against the Monster, who completely dominated the game (the Monster was a word processor from New Mexico who played videogames as a hobby, so it wasn't surprising that his timing with the buzzer was extra-super good to begin with - and he'd had the non-trivial advantage of four previous games in which to hone his timing). Twice during Double Jeopardy, the New Mexico Monster declined to buzz in, which permitted the little man from New Jersey to do so. Both times, the question was a difficult and obscure one, and both times the little man from New Jersey failed to supply the correct answer, so, when Final Jeopardy came around, there was an empty podium where the little man from New Jersey had been, and, predictably, the Monster became a five-time champion.
Boy, was I glad I didn't have to go up against HIM.
Word play (Score:3)
The ability to handle Jeopardy's style of word play is very impressive. I have to wonder if Watson can handle it in all the varieties that is is used on the show and whether the categories are cherry picked to match its abilities. Ideally the writers won't know that their answers are going to be used for the big game and the categories will be picked at random from a pool (minus audio and video clues).
Re: (Score:2)
That's an interesting point, and I believe it came up in prior threads on this topic.
Jeopardy's 'answers' generally include a clue to help the player intuitively confirm that his/her response is accurate. Does Watson algorithm use these, or does it just 'brute force' the lookup in its vast memory?
Re: (Score:2)
Re: (Score:2)
Mashup time? (Score:2)
For some reason I keep on thinking this calls for a remix of 'I Lost on Jeopardy", but now with with AutoTune.
Re: (Score:2)
Re: (Score:2)
I hate you - now I have that stuck in my head!
Baby! OOOOOOooOOoooooh!
Buzz in Times (Score:2)
Re: (Score:3)
Watson only buzzes in when he is confident that he knows the answer, which is apparently about 50% of the time. Of the 50% that he does buzz in, he answers correctly 80-90% of the time. If how the measure the confidence is accurate (and doesn't produce a lot of false negatives) it's likely that Watson would end the game in the red if he just buzzed in every time before he was sure of the answer. And according to the article, Watson has to physically push the button on the buzzer to buzz in. That probabl
Re: (Score:2)
That may be a disadvantage for the computer, actually. Good players are known to buzz in immediately on their strong categories and take the few seconds allowed to come up with the response. On the other hand, Watson could do the exact same thing, and presumably do so faster than any human.
Re: (Score:2)
Re: (Score:2)
TFA sez that Watson must trigger a mechanical switch to buzz in. But yea, this is a trivial engineering problem, and Watson still has an advantage in reaction time.
I know this is /., but read TFA, it will answer most of those questions.
Re: (Score:2)
Re: (Score:2)
Human players are allowed to do that, and if they fail to come up with the answer after buzzing, they get penalized. Same thing could be true for the machine.
Re: (Score:2)
It's all fun and games, until (Score:2)
Watson becomes self-aware on August 29th, and the IBM engineers panic and try to pull the plug...
Rough... (Score:2)
Sean Connery, that is not in the R's (Score:2)
Imagine that! (Score:2)
A supercomputer is faster/better at recalling facts from it's database than humans can from memory? Who woulda thunk it?
Have you ever actually watched Jeopardy? (Score:2)
Jeopardy is not some retarded trivia show like "Who Wants to Be a Millionaire." Many of the questions involve puns, wordplay, rhymes, etc. that cannot be answered with a Google search.
Voice (Score:2)
FTA: "Watson spoke in a stilted computerized voice–and was almost never wrong."
I'm still hoping they'll sneak a Scottish accent in there at the last minute. And maybe a joke about a mallard.
The answer is ... "42" (Score:2)
.
Small sample size (Score:2)
In today’s exhibition of about 15 questions, Watson tallied $4,400, compared to $3,400 for Jennings and $1,200 Rutter.
15 questions is not really enough to say anything. Maybe the humans aren't used to buzzing in this context (I assume Watson's buzzer skills are basically static), they weren't "warmed up", or the categories favored Watson's strengths and their weaknesses. (Or maybe not.) Interestingly, if these 15 questions corresponded to 3 first round categories, every question was answered, from the money totals. I also find this interesting:
[Watson] buzzes in about half the time, and answers 85 to 95 percent of those questions correctly.
is this that surprising? (Score:2)
Re: (Score:2)
Just buy some of those "how to win the lottery" books, and program in those rules. You should be able to win every single one. Once you determine the winnig number, make sure you buy hundreds of winning tickets! Mortgage your house so you can buy as many as possible!
Re: (Score:3)
Really?
Quiz shows are designed with a targeted IQ in mind.
The people who write and select the questions know the percentages of how many people in their player population will know the answer.
Winning the game is dependent on the random distribution of the selected set of questions falling within the percentage of things you know.
Which means that it's not just as skill or talent game. You also have to be asked the right questions, and since you don't control that, it's luck.
Re: (Score:2)
Re: (Score:2)
You missed GLADoS.
Re: (Score:2)
Re: (Score:2)