Translation Software That Learns by Reading 308
redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""
High school Spanish (Score:3, Funny)
Scanning Audio Files (Score:3, Interesting)
It says it can scan through audio files an input source. I wonder if this causes it to "learn" the auditory signatures (and thus only knows the translation when given audio input), or if it relies on text to speech from to convert it to text first?
If it does the latter, than based on the quality of current text-to-speech software, this probably wouldn't do much good in a total immersion classroom situation...
Sure would have helped with my German
Re:High school Spanish (Score:4, Informative)
Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.
efnet spanish (Score:5, Funny)
q w3n0! 3so si está 1337!
Re:efnet spanish (Score:2)
Re:efnet spanish (Score:3, Funny)
Re:High school Spanish (Score:2, Interesting)
Heh. Then there is nothing that will make you believe, etc., etc.
Certainly you can't do good translation without understanding syntax (which influences meaning and underlies word order) and context (to disambiguate synonyms and phrases with multiple interpretations). Machines aren't especi
Re:High school Spanish (Score:3, Insightful)
Re:High school Spanish (Score:2)
Re:High school Spanish (Score:2)
The teacher actually became suspicious when certain words were in entirely the wrong context, and about the bizarre syntax (totally English word orders, etc), but I was never busted. Quite funny really, since my best friend in class was using it too! Still, I don't think most people realised the Internet existed ba
Re:High school Spanish (Score:2)
My feeling was that it usually worked out okay. Those
Dealing with Disruptive Technology (Score:3, Insightful)
What is more important, the knowledge gained through rigorous study or the ablility to acomplish what the studing provides through a machine.
Being technical oriented, I have to say the machine. But I am not being disresp
technical texts (Score:4, Funny)
Re:technical texts (Score:2, Funny)
1. If we screw up it's not our fault ... and you owe use your first born.
2. If you screw up, well you're screwed.
Re:technical texts (Score:2)
Re:technical texts (Score:2)
translate to American please (Score:3, Funny)
Thanks.
Re:translate to American please (Score:5, Funny)
r3Ð(0n3 wr173$ "N3w $(13n71$7 1$ r3p0r71n9 7h47 7r4n$£4710n $07w4r3 7h47 Ð3v3£0p$ 4n nÐ3r$74nÐ1n9 0 £4n9493$ b¥ $(4nn1n9 7hr09h 7h0$4nÐ$ 0 pr3v10$£¥ 7r4n$£473Ð Ð0(m3n7$ h4$ b33n r3£34$3Ð b¥
And translation #2:
REDCONE WRIETS NU SCEINTIST IS R3PORTNG TAHT TRANSLATION R TAHT D3V3LOPS AN UNDERSTANDNG OF LANGUAEGS BY SCANNG THROUGH THOUSANDS OF PREVIOUSLY TRANSLAETD DOCUMENTS HAS B3N REL3AESD BY US!!!! OMG R3S3ARCHARS!!1!1!! LOL ACORDNG 2 DA ARTICL3 TEH TRANSLAETD DOCUMENTS US3D 2 T3ACH TEH TRANSLATION ALGORITHMS CAN B 3LECTRONIC ON PAEPR OR 3V3N AUDIO FIELS!!1111 TEH SYSTEM IS NOT ONLY FASTER THAN OTH3R M3THODS BUT ALSO BT3R SUIETD 2 TAKLNG LAS COMON LANGUAEGS AND TEH UNUSUAL VOCABULARY FOUND IN SPACIALIESD OR TECHNICAL TEXTS!1!! WTF
Re:translate to American please (Score:3, Funny)
Now reorder the phrases in every sentence so that the object phrase starts the sentence, change every sentence which contains the word because so that the word because and the words following it start the sentence. Make sure that every infinitive verb has the adverb between the word to and the verb. Change every occurrence of which to that Find every word more than 3 syllables long and inject sever
Re:translate to American please (Score:3, Funny)
That redcone fella did say something about some rag reporting some computer thingymebob that lets me understand what all those japs are saying. The city rag reckons it's real fast.
Yay! (Score:4, Funny)
Harry Potter and the Bible (Score:5, Interesting)
This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.
Re:Harry Potter and the Bible (Score:5, Funny)
Or did JK Rowling suddenly become pious?
Re:Harry Potter and the Bible (Score:2)
Oh fuck. (Score:3, Insightful)
Damn, i love this place. Seriously, dammit. Here we have post on a tech/it site titled "Harry Potter and the Bible " modded +4 Interesting at the time of this posting
My head totally hurts. Clod.
Turing test (Score:3, Insightful)
Re:Turing test (Score:2)
Wow! Does a much better job... (Score:5, Funny)
Not hard wares that sticks an comprehension of talks by scanning on thousands of fish translated papers has been vomited by US scientists.
Many existing translation not hard wares uses palm rules for botching words and phrases. But the new software, snarked by Kevin Knight and Daniel Marcu at the Information Sciences[...]
Read More... [newscientist.com]
Neural Nets and Machine Learning (Score:2, Interesting)
Neural Nets? I assumed it was Bayes. (Score:2)
That's great.... (Score:4, Funny)
OUTPUT: w007! (Score:2)
Re:That's great.... (Score:2)
That sounds like a good approach (Score:4, Insightful)
But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.
Easy (Score:3, Funny)
Re:Easy (Score:2)
LOL! And here's the same message translated from Cat to Dog and back into English:
"Bark!"
Re:Easy (Score:2)
Re:Easy (Score:2)
English: I think computers are great. -> Dog: Good doggie.
English: Aren't you paying attention to what I'm saying, you stupid mutt!? -> Dog: Good doggie.
Re:Easy (Score:2)
"Bluh bluh bluh bluh bluh! Bluh bluh bluh bluh sit!"
Re:That sounds like a good approach (Score:2)
Re:That sounds like a good approach (Score:2)
But, if you give a bundle human material to computers, in order to read, set the directories language, like them are really used, not out, even there the directory them have. If odd language consumption falls after us, how it rains cats and dogs, has it a data base of similar consumption to draw to on. He, is it a rising ascent, but this is a good avenue, to of of Cheerio to try from of compute
Re:That sounds like a good approach (Score:3, Funny)
If it is possible and cuz of the translation of the software of the
wealth (until the necessity to the danger) this person whom it causes,
this member of the quality of the well-educated way and, in me who I
consult that it examines it, of its type of the search of the thing
the truth that the lheo requests to neces
Re:That sounds like a good approach (Score:3, Funny)
"The moon's a harsh mistress" converges quickly to
"With the love seriously the moon"
Whereas the text on top of the search
"Sometimes it's fast, sometimes it's slow. Sometimes it doesn't work at all."
takes a long time to converge to
"To the times during the hour, this comes during the period from digiu
Re:That sounds like a good approach (Score:2)
Lexicon: Descriptive.. attempts to include as many words/uses as possible.
By doing it based on existing documents you end up with a lexicon.
Philosophical caveat (Score:5, Insightful)
I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.
Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room
This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?
Are these silly questions to ask?
Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).
RD
Re:Philosophical caveat (Score:4, Interesting)
I think that software that can learn can be said to understand a problem just as much as a human can. The difference between understanding and just doing is having the ability to learn from new data and to change your actions as required.
Re:Philosophical caveat (Score:5, Insightful)
Mom baked for three hours.
The pie baked for three hours.
"Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.
A man walked into a bar. Ouch!
A man wanted to win a pun contest in the local newspaper, so he entered 10 times in order to increase the chances that one of his entries would win. Unfortunately, no pun in ten did.
You can translate that 50 ways from Sunday but without understanding the language - understanding what makes those statements interesting - the machine will lose all their meaning.
Re:Philosophical caveat (Score:2)
The pie baked for three hours.
"Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.
Re:Philosophical caveat (Score:3, Informative)
The sentence "the pie was baked for three hours" differs in meaning, because it implies that someone was there, actively baking the pie.
Re:Philosophical caveat (Score:3, Insightful)
that's exactly why i like my anime fansubbed instead of sanitized.
Oh my god. (Score:2)
(Intentionality is a useful useful concept. Don't get me wrong. It is the bowels of philosophy that kills me, in the same way that the bowels of Crit Lit kills me.)
Chinese room (Score:2)
With this system that gradually creates its own system of output from comparing various inputs, how is it really behaving any differently than an infant learning to speak?
Re:Philosophical caveat (Score:2)
Re:Philosophical caveat (Score:3, Insightful)
Note also that such statistical approaches are nothing new, it's just that computers are finally getting powerful enough that people can use them.
None of that has anything to do with Searle. Searle wouldn't admit that the system understands language even if it knew things about the real worl
Re:Philosophical caveat (Score:3, Insightful)
His argument essentially boils down to: "The computer doesn't understand because all it does is manipulate symbols. Even if it does exactly the same steps as a human, the human understood and the computer was just being a mimic. Giving the computer a body wouldn't make it any less of a mimic".
The
Re:Philosophical caveat (Score:3, Insightful)
In his scenario, Searle claims that neither the people moving the Chienese tokens, nor the book of instructions telling them what to do "understands" what is being said. That is obviously true, but it misses the point. That's like saying that the neurons in your head don't understand what you are saying, and so neither do you.
The workers in the Chinese Room argument are just hardware. They're akin to neurons in the brain, or chips in a computer. They're blind
Google definitely would buy into this... (Score:5, Interesting)
Re:Google definitely would buy into this... (Score:3, Interesting)
Some of the
Translating specialised texts ... (Score:5, Insightful)
The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.
With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame
Re:Translating specialised texts ... (Score:4, Informative)
Of course that is true, for a human translator. Your knowledge of the technical field itself is a resource you can use to aid in your translation of technical texts. For machines, it's usually necessary to use a translator specifically geared to the subject matter. For instance, you would definitely want to use a different machine translator for a newspaper article as opposed to a biomedical research journal.
This new approach is supposed to mitigate these problems. If they can do a good job of it, they may be able to bring machine translation to areas where previously human translators have been required or greatly preferred.
Re:Translating specialised texts ... (Score:3, Insightful)
There's more to help you than just the specialized vocabulary. It's good that "crankshaft" is unambiguous but it also helps to know in advance that "bolt" will be a noun and not a verb.
Also, to be blunt, nobody expects technical prose to so
Only as smart as . . . (Score:2)
Huzzah! (Score:2, Funny)
DadaDodo (Score:4, Informative)
Mission statements (Score:3, Insightful)
Microsoft Research already does this (Score:5, Informative)
Re:Microsoft Research already does this (Score:2)
Have YOU read "that" KB, AC? Or are you just blowing smoke out of your ass?
Arabic to English (Score:5, Interesting)
After a quick web search, all I was able to find was this site [sakhr.com], which has a pretty sketchy TOS agreement.
Re:Arabic to English (Score:2)
I'm not saying you should trust CNN more or less than Aljazeera, but they both have agendas. Put on your tinfoil anti-bias hat before reading either translation.
However, a good program that could translate could be a great help when in situations like this.
Re:Arabic to English (Score:3, Informative)
kuro5hin... (Score:2)
Dragon Naturally Speaking (Score:4, Interesting)
It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.
Time flies like an arrow... (Score:5, Funny)
When an automated translator can handle that one without bursting into flames, I'll start to believe.
Re:Time flies like an arrow... (Score:2)
Or are you expecting a computer to solve a problem even a human can't handle?
Re:Time flies like an arrow... (Score:2)
I can handle this problem easily, but would be impressed if you found a computer program that would do the same.
See, this sentence is grammatically correct in two ways, but one of them is not logical.
a computer would likely produce this:
TIME (flies) (as) AN ARROW and FRUIT (flies) (as) A BANANA.
when you really want:
TIME (flies) (as) AN ARROW and FRUIT FLIES (favor) A BA
Re:Time flies like an arrow... (Score:2)
We have this thing in english called parallelism. It is expected that when you use it you do so in a way that aids the reader's understanding.
Furthermore, a more grammatically correct way (and for more readable) to say that would be "fruit flies like bananas", or better yet, given the ambiguity you point out, "fruit flies enjoy bananas".
I would love for you to provide a better example though -
Re:Time flies like an arrow... (Score:2)
Re:Time flies like an arrow... (Score:2)
Re:Time flies like an arrow... (Score:2)
I would love for you to provide a better example though - and although they doubtlessly exist, I've no doubt they can be accomodated for.
No doubt, modifying English to accomodate a computer is easily done. However, that isn't the the trick we're going for. We're trying for a computer program that can parse any English, like the original post in this thread.
Re:Time flies like an arrow... (Score:2)
Actually, at this point a statistical system based on an automatically collected corpus is
Re:Time flies like an arrow... (Score:2)
Re:Time flies like an arrow... (Score:2, Insightful)
"Time flies like an arrow, fruit flies like a banana" is a joke. Translating it into other languages would neither be funny or especially meaningful, as the whole po
Reading Everything (Score:2, Funny)
How is that news? Research was done 10 years ago. (Score:4, Interesting)
years ago by IBM: The Mathematics of Statistical Machine Translation [upenn.edu]. And even free software has been available for a while, see
http://www.fjoch.com/GIZA++.html [fjoch.com].
It's only a matter of time before... (Score:4, Funny)
No samples? (Score:4, Interesting)
Without even the simplest of examples or samples we have only their word on how well this works.
DOOMED (Score:3, Interesting)
Too bad about the times it needs to think (Score:4, Insightful)
A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.
Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.
I can't imagine... (Score:2)
As a person who speaks both English and Japanese, I can't believe that anyone could ever come up with an algorithm to translate between these languages. So much of it is context and nuance based, not to mention that there are words in the languages that simply do not exist in the other language so the only way to really understand it and make an attempt to translate is
Tests (Score:3, Interesting)
Re:Tests (Score:3, Insightful)
Comprehensible output? (Score:2)
Can it run faster than you? (Score:2, Insightful)
This is evolution, not revolution (Score:2)
EBMT never really worked very well (it needed millions of translations before it'd start to yield anything useful, and even then it needed hand-holding), but perhaps these new researchers have taken it to the next step.
Johnny Five Alive (Score:3, Funny)
The first such system was built in 1993. (Score:3, Interesting)
The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.
Re:Universal Translator (Score:3, Informative)
so how can they grade you in school? (Score:4, Insightful)
Sometimes brute force, ie look up tables for 100000000 translated versions can be better, so much for logic eh
Here's what really makes people think... (Score:2)
Re:Hum... translate what politics say (Score:2)
-AD