Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Software Science

Translation Software That Learns by Reading 308

redcone writes "New Scientist is reporting that translation software that develops an understanding of languages by scanning through thousands of previously translated documents has been released by U.S. researchers. According to the article "The translated documents used to teach the translation algorithms can be electronic, on paper, or even audio files. The system is not only faster than other methods, but also better suited to tackling less common languages and the unusual vocabulary found in specialised or technical texts.""
This discussion has been archived. No new comments can be posted.

Translation Software That Learns by Reading

Comments Filter:
  • by KaSkA101 ( 692931 ) on Wednesday February 23, 2005 @08:56PM (#11761859) Homepage
    Why didn't I have this software during High School Spanish?
    • Scanning Audio Files (Score:3, Interesting)

      by BobPaul ( 710574 ) *
      Why didn't I have this software during High School Spanish?

      It says it can scan through audio files an input source. I wonder if this causes it to "learn" the auditory signatures (and thus only knows the translation when given audio input), or if it relies on text to speech from to convert it to text first?

      If it does the latter, than based on the quality of current text-to-speech software, this probably wouldn't do much good in a total immersion classroom situation...

      Sure would have helped with my German
    • by xtrvd ( 762313 ) on Wednesday February 23, 2005 @09:25PM (#11762050)
      Fortunately I had the next best thing in High School Spanish. The trick is simply going to the #spain channel on efnet and talking nice to some people. You'd be amazed as to how often my teacher would fail my fellow students because they attempted using the primitive babelfish.altavista.com to do their work for them; she could easily spot the syntax errors and mis-spelled english words which were never translated.

      Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.
      • by Garabito ( 720521 ) on Wednesday February 23, 2005 @10:41PM (#11762513)
        k apr3ndist3 3sp4ni0l en IRC?
        q w3n0! 3so si está 1337!
      • Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying.

        Heh. Then there is nothing that will make you believe, etc., etc.

        Certainly you can't do good translation without understanding syntax (which influences meaning and underlies word order) and context (to disambiguate synonyms and phrases with multiple interpretations). Machines aren't especi
      • Until I see this new process in the works, however, there is nothing that will make me believe it's better than finding another human who can *understand* what you are saying and the context to which you are implying. "Better" is an ambiguous term. For what these researchers made the program for, it is better than humans for one reason: speed. Sure they want the translations to be reliable, but more importantly is that a computer can do in a few days what would take a human a month, for this application at
    • good point. Is this not like babblefish?
    • I used to cheat on my French homework when Altavista's Babelfish came out (must have been about 1996/7?). It was great, could type in my homework, hit the button.. and bam, homework was all done.

      The teacher actually became suspicious when certain words were in entirely the wrong context, and about the bizarre syntax (totally English word orders, etc), but I was never busted. Quite funny really, since my best friend in class was using it too! Still, I don't think most people realised the Internet existed ba
      • Well, I was teaching in university around that time, and I used to get some of those machine translations. It was extremely obvious what was going on. Sometimes I would confront the student about it, but usually I just made the effort to teach the student to correct the worst of the problems. Of course that did nothing for the computer that was doing the translations, and the next time around it would be just as bad--so I'd take out double on the grade.

        My feeling was that it usually worked out okay. Those

        • This message brings up some excellent points about dealing with disruptive technology. A teacher whose job it is to get students to master material in a certain subject realizes that there is a machine that provide the same function that previously could only be gained by hard study.
          What is more important, the knowledge gained through rigorous study or the ablility to acomplish what the studing provides through a machine.
          Being technical oriented, I have to say the machine. But I am not being disresp
  • by Olaserov ( 785074 ) on Wednesday February 23, 2005 @08:57PM (#11761866) Homepage
    I wonder if we could train it to translate a EULA ;)
  • by Anonymous Coward on Wednesday February 23, 2005 @08:58PM (#11761872)
    Can someone translate that article from British english to American english please.

    Thanks.
    • by Grey Ninja ( 739021 ) on Wednesday February 23, 2005 @09:26PM (#11762057) Homepage Journal
      Here's a couple of suggestions for you:

      r3Ð(0n3 wr173$ "N3w $(13n71$7 1$ r3p0r71n9 7h47 7r4n$£4710n $07w4r3 7h47 Ð3v3£0p$ 4n nÐ3r$74nÐ1n9 0 £4n9493$ b¥ $(4nn1n9 7hr09h 7h0$4nÐ$ 0 pr3v10$£¥ 7r4n$£473Ð Ð0(m3n7$ h4$ b33n r3£34$3Ð b¥ .$. r3$34r(h3r$. 4((0rÐ1n9 70 7h3 4r71(£3 "7h3 7r4n$£473Ð Ð0(m3n7$ $3Ð 70 734(h 7h3 7r4n$£4710n 4£90r17hm$ (4n b3 3£3(7r0n1(, 0n p4p3r, 0r 3v3n 4Ð10 1£3$. 7h3 $¥$73m 1$ n07 0n£¥ 4$73r 7h4n 07h3r m37h0Ð$, b7 4£$0 b3773r $173Ð 70 74(|{£1n9 £3$$ (0mm0n £4n9493$ 4nÐ 7h3 n$4£ v0(4b£4r¥ 0nÐ 1n $p3(14£1$3Ð 0r 73(hn1(4£ 73x7$.""

      And translation #2:

      REDCONE WRIETS NU SCEINTIST IS R3PORTNG TAHT TRANSLATION R TAHT D3V3LOPS AN UNDERSTANDNG OF LANGUAEGS BY SCANNG THROUGH THOUSANDS OF PREVIOUSLY TRANSLAETD DOCUMENTS HAS B3N REL3AESD BY US!!!! OMG R3S3ARCHARS!!1!1!! LOL ACORDNG 2 DA ARTICL3 TEH TRANSLAETD DOCUMENTS US3D 2 T3ACH TEH TRANSLATION ALGORITHMS CAN B 3LECTRONIC ON PAEPR OR 3V3N AUDIO FIELS!!1111 TEH SYSTEM IS NOT ONLY FASTER THAN OTH3R M3THODS BUT ALSO BT3R SUIETD 2 TAKLNG LAS COMON LANGUAEGS AND TEH UNUSUAL VOCABULARY FOUND IN SPACIALIESD OR TECHNICAL TEXTS!1!! WTF
    • No trouble at all. First take the original text, computer translate it into German, and then back into English.

      Now reorder the phrases in every sentence so that the object phrase starts the sentence, change every sentence which contains the word because so that the word because and the words following it start the sentence. Make sure that every infinitive verb has the adverb between the word to and the verb. Change every occurrence of which to that Find every word more than 3 syllables long and inject sever

    • I can translate it to Australian:

      That redcone fella did say something about some rag reporting some computer thingymebob that lets me understand what all those japs are saying. The city rag reckons it's real fast.
  • Yay! (Score:4, Funny)

    by gardyloo ( 512791 ) on Wednesday February 23, 2005 @08:58PM (#11761874)
    Hope for slashdot. I've always wondered if we only have artificially intelligent editors...
  • by MikeFM ( 12491 ) on Wednesday February 23, 2005 @08:59PM (#11761881) Homepage Journal
    I remember hearing about this a couple years ago. They were using translations of Harry Potter and the Bible to teach this software to translate. It seems to work well. I wonder what it'd make of different translations of technical documentation. That'd probably be even more interesting than what it'd make out of 'quidditch'.

    This could be great if it were opensourced. It'd be nice to translate email, instant messages, websites, technical docs, and lots of other stuff we're currently using the fish for. The fish is nice but not that effecient to add to other programs and it's translations aren't usually that great.
    • by obeythefist ( 719316 ) on Wednesday February 23, 2005 @09:45PM (#11762167) Journal
      I never read that one. I thought the next book title was going to be "Harry Potter and the Half-Blood Prince".

      Or did JK Rowling suddenly become pious?
    • Oh fuck. (Score:3, Insightful)

      Something in my head just popped.

      Damn, i love this place. Seriously, dammit. Here we have post on a tech/it site titled "Harry Potter and the Bible " modded +4 Interesting at the time of this posting ... that is actually interesting. And even i find it interesting and the fact that you are most likely of age and know what is and how to spell "quidditch" is quite frightening. i'm sad to say i knew it too (they took my Ko0lBadge away a long time ago).

      My head totally hurts. Clod.
  • Turing test (Score:3, Insightful)

    by OneArmedMan ( 606657 ) on Wednesday February 23, 2005 @08:59PM (#11761882)
    I wonder if something similar to this could be used for AI , for say Turing Test's ?
    • Sit it in the middle of a major chat hub, and have it learn what to say in response to the last two inputs? Sounds good to me. Where can we find a chat hub?

  • by bigtallmofo ( 695287 ) on Wednesday February 23, 2005 @09:01PM (#11761896)
    Teach Software translating on scanning up

    Not hard wares that sticks an comprehension of talks by scanning on thousands of fish translated papers has been vomited by US scientists.

    Many existing translation not hard wares uses palm rules for botching words and phrases. But the new software, snarked by Kevin Knight and Daniel Marcu at the Information Sciences[...]

    Read More... [newscientist.com]
  • In one way or another this is similar to training neural nets to recognize images, or spam filters to mark junkmail. Great way to put number-crunching power of computers to direct work.
  • by Frodo Crockett ( 861942 ) on Wednesday February 23, 2005 @09:02PM (#11761905)
    ...bu7 (4n 17 unÐ3r$74nÐ £337?
  • by FunWithHeadlines ( 644929 ) on Wednesday February 23, 2005 @09:03PM (#11761915) Homepage
    I wish them luck (cuz they'll need it), but if anything is going to produce translation software that really works it will have to include learning elements of this nature. It's one thing to get dictionary translations. That's been around for decades, with its laughable results. Humans speak in metaphor and simile and slang and contractions and abbreviations of thought all the time. We're the cat's meow of language (try that, computer!).

    But if you give computers a bunch of human stuff to read, you expose the dictionaries to language as it is actually used, not just as the dictionary has it. Then when odd language usage falls upon us like it's raining cats and dogs, they will have a database of similar usage to draw upon. Hey, it's an uphill climb, but this is a good avenue to try. Cheerio, computers, and a top o' the mornin' to ya.

    • Easy (Score:3, Funny)

      by beldraen ( 94534 )
      English->Cat: Meow!
      • " English->Cat: Meow!"

        LOL! And here's the same message translated from Cat to Dog and back into English:

        "Bark!"

        • English: Did you hear about the new translation software? -> Dog: Woof! -> English: I want some food now.
          • English: Isn't this new software amazing? -> Dog: Good doggie.

            English: I think computers are great. -> Dog: Good doggie.

            English: Aren't you paying attention to what I'm saying, you stupid mutt!? -> Dog: Good doggie.

            • Reminds me of the The Simpsons where Bart is trying to train Santa's Little Helper

              "Bluh bluh bluh bluh bluh! Bluh bluh bluh bluh sit!"
    • I wish them luck (cuz they need it), but, if all translation software will produce, which really work must include it acquisition elements of this nature. It is a thing to receive for from of dictionary translations to. That is around for decades, with his laughable results. Humans speak simile and jargon and contractions and abbreviations of the thought the whole time in the metaphor and. We are meow the cat of the language (attempt that, computer!). But, if you give a bundle human material to computers, i
      • Ooh, good idea! OK, here is my second paragraph as translated by Systran into German and then back into English:

        But, if you give a bundle human material to computers, in order to read, set the directories language, like them are really used, not out, even there the directory them have. If odd language consumption falls after us, how it rains cats and dogs, has it a data base of similar consumption to draw to on. He, is it a rising ascent, but this is a good avenue, to of of Cheerio to try from of compute

        • Bah, one language to other and back is too little. You have to do the complete thing, of course. As you can do here [tashian.com]. I've done it to your text for you (of course including the east asian languages):

          If it is possible and cuz of the translation of the software of the
          wealth (until the necessity to the danger) this person whom it causes,
          this member of the quality of the well-educated way and, in me who I
          consult that it examines it, of its type of the search of the thing
          the truth that the lheo requests to neces
    • Dictionary: Prescriptive and include words that have become generally accepted.

      Lexicon: Descriptive.. attempts to include as many words/uses as possible.

      By doing it based on existing documents you end up with a lexicon.
  • by Raindance ( 680694 ) * <`johnsonmx' `at' `gmail.com'> on Wednesday February 23, 2005 @09:03PM (#11761916) Homepage Journal
    As a caveat, we should be wary of saying the system "understands" a language.

    I would say generally that humans able to translate between languages generally understand both languages, but whether a statistical, probabilistic model based on correlations understands a language might be a stretch.

    Further reading: Searle's Chinese Room argument- http://en.wikipedia.org/wiki/Chinese_room

    This is akin to asking, Does your tax software understand the tax code? Does Photoshop understand the principles of image manipulation?

    Are these silly questions to ask?

    Further reading: Dennett on intentionality (http://en.wikipedia.org/wiki/Dennett but the entry is pretty sparse).

    RD
    • by MikeFM ( 12491 ) on Wednesday February 23, 2005 @09:08PM (#11761964) Homepage Journal
      Does anybody understand the tax code? Why should software be any different?

      I think that software that can learn can be said to understand a problem just as much as a human can. The difference between understanding and just doing is having the ability to learn from new data and to change your actions as required.
    • by back_pages ( 600753 ) <back_pages AT cox DOT net> on Wednesday February 23, 2005 @09:47PM (#11762178) Journal
      Great example of this:

      Mom baked for three hours.
      The pie baked for three hours.

      "Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.

      A man walked into a bar. Ouch!

      A man wanted to win a pun contest in the local newspaper, so he entered 10 times in order to increase the chances that one of his entries would win. Unfortunately, no pun in ten did.

      You can translate that 50 ways from Sunday but without understanding the language - understanding what makes those statements interesting - the machine will lose all their meaning.

      • Mom baked for three hours.
        The pie baked for three hours.

        "Mom" and "The pie" are the subjects. The verb and entire predicate are identical. Understanding the language disambiguates these sentences, but the ambiguity is part of what defines humor.
        ... except for the fact that "The pie baked for three hours" isn't good grammar. More properly, it's "the pie was baked for three hours". That's clarifies the statement because "was baked" is a passive construction.
        • xcept for the fact that "The pie baked for three hours" isn't good grammar.
          Why ? You could say, after all, "The pie baked normally for three hours in the oven, then it started to burn". It's acceptable grammar, but it's a confusing sentence on the semantic level.

          The sentence "the pie was baked for three hours" differs in meaning, because it implies that someone was there, actively baking the pie.


      • that's exactly why i like my anime fansubbed instead of sanitized.
    • You're giving me awful flashbacks to my Philosophy of Mind and Language class. If I never hear about Intentionality, keyholes, or NS semantics, I'll be happy. Searles, parse my code.

      (Intentionality is a useful useful concept. Don't get me wrong. It is the bowels of philosophy that kills me, in the same way that the bowels of Crit Lit kills me.)

    • The Chinese Room argument is an illustration of a normal 'dumb' computer program that is coded by a human, not artificial intelligence that learns and figures out its own rules of how to behave.

      With this system that gradually creates its own system of output from comparing various inputs, how is it really behaving any differently than an infant learning to speak?
    • I see that you're trying to cheat on your taxes. Would you like me to help?
    • You are right: this software does not understand language; it works out statistical correspondences, but it has no understanding of the physical correlates of words. That also means that it has intrinsic limitations.

      Note also that such statistical approaches are nothing new, it's just that computers are finally getting powerful enough that people can use them.

      None of that has anything to do with Searle. Searle wouldn't admit that the system understands language even if it knew things about the real worl
    • NO! NO! NO! Not the Searle argument again. That guy is an absolute nutter and should be banned! Actually, on second thoughts, as long as I never have to hear his drivel again, I don't mind what happens to him.

      His argument essentially boils down to: "The computer doesn't understand because all it does is manipulate symbols. Even if it does exactly the same steps as a human, the human understood and the computer was just being a mimic. Giving the computer a body wouldn't make it any less of a mimic".

      The
    • Searle's Chinese Room argument is hogwash.

      In his scenario, Searle claims that neither the people moving the Chienese tokens, nor the book of instructions telling them what to do "understands" what is being said. That is obviously true, but it misses the point. That's like saying that the neurons in your head don't understand what you are saying, and so neither do you.

      The workers in the Chinese Room argument are just hardware. They're akin to neurons in the brain, or chips in a computer. They're blind
  • by egyber ( 788117 ) on Wednesday February 23, 2005 @09:03PM (#11761919)
    Don't remember exactly where I read this, but google apparently has long believed that there is enough data on the internet alone to be able to intelligently translate... What these guys claim to have done is, it would seem, the missing peace of the puzzle for google. I wouldn't be surprised if google gets in on this.
    • I tried that in about 1997. It did work pretty well but the biggest problem was the limitation of having copies of the same document in different languages. There are quite a few but they were dwarfed by the amount of single-language documents. Also the fact is that most text on the Internet is written the way that I write - badly. This can lead to translations that are written the way real people write which can be good for conversational bots but which is probably bad for translation software.

      Some of the
  • by rkmath ( 26375 ) on Wednesday February 23, 2005 @09:03PM (#11761921)
    The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.

    The main reason (I think) is that: tech documents have specialised vocabulary and idioms, but these are much fewer than the idioms one has to master in order to understand the editorial page in a newspaper.

    With a rudimentary knowledge of Russian and French, I have found it much easier to read an engineering textbook or paper in these languages, than reading any nontechnical text. (This is not necessarily the case with other languages. Any document in Japanese for instance is an entirely different ballgame ...)

    • by Anonymous Coward on Wednesday February 23, 2005 @09:08PM (#11761967)
      The article (and the text of the orginial posting) makes it seem like translating a specialized technical text is somehow harder than translating, say, a newspaper article. As someone experienced in translating technical (science/engineering) documents, I can say that any tech document is far _easier_ to translate after an initial learning curve.


      Of course that is true, for a human translator. Your knowledge of the technical field itself is a resource you can use to aid in your translation of technical texts. For machines, it's usually necessary to use a translator specifically geared to the subject matter. For instance, you would definitely want to use a different machine translator for a newspaper article as opposed to a biomedical research journal.

      This new approach is supposed to mitigate these problems. If they can do a good job of it, they may be able to bring machine translation to areas where previously human translators have been required or greatly preferred.
    • Absolutely true. One of Beryllium Sphere's partners is a computational linguist. For quite a while her bread and butter was building representations of knowledge about heavy equipment maintenance to support automatic translation of Caterpillar Tractor technical manuals.

      There's more to help you than just the specialized vocabulary. It's good that "crankshaft" is unambiguous but it also helps to know in advance that "bolt" will be a noun and not a verb.

      Also, to be blunt, nobody expects technical prose to so
  • the texts it has worked on. If all you give this programme is a steady diet of weather reports to translate and learn from, it will make everything else sound like a weather report. Most contexts employ similiar words with a significant contextual meaning to them. 'Pea-soup' means very different things in weather reports and cooking recipes.
  • Huzzah! (Score:2, Funny)

    by Tzarius ( 688342 )
    Now my Bayesian mail filter can translate spam to english before it's read!
  • DadaDodo (Score:4, Informative)

    by Tripax ( 162140 ) on Wednesday February 23, 2005 @09:06PM (#11761945)
    This reminda me of Jamie Zawinskies hack Dadadodo [jwz.org] which used probability trees to create new texts from old texts by examining the probability any given word follows the previous word/string of words. I always thought his program was cool, in that his description of it involved Markov Chains and William S. Burroughs.
  • by drdink ( 77 ) * <smkelly+slashdot@zombie.org> on Wednesday February 23, 2005 @09:07PM (#11761959) Homepage
    I did a presentation for an AI class a while ago and discovered that Microsoft already does this with their MSR-MT [microsoft.com] project. Apparently the Spanish entries in their Knowledge Base were translated by this as well.
  • Arabic to English (Score:5, Interesting)

    by Caseyscrib ( 728790 ) on Wednesday February 23, 2005 @09:12PM (#11761986)
    I'd like to see an arabic-to-english translator. I was interested in reading news from the middle east, because I don't particularly trust our media to translate it properly. A good example of this is Bin Laden's transcript [kuro5hin.org].

    After a quick web search, all I was able to find was this site [sakhr.com], which has a pretty sketchy TOS agreement.

    • You trust the Aljazeera version more than the CNN version?

      I'm not saying you should trust CNN more or less than Aljazeera, but they both have agendas. Put on your tinfoil anti-bias hat before reading either translation.

      However, a good program that could translate could be a great help when in situations like this.
      • I never said I trusted either source. But when you can read Arabic propaganda and contrast it with your own media's propaganda, it helps you to understand what the underlying causes for war are. It is also key to recognizing the true aggressor, because in every war both governments play the "good guys" role to their citizens. Direct translation helps you to understand the culture of your enemy. Things as simple as webpage advertisements, editorials, personals, etc, are lost in translation by CNN and the
    • kuro5hin is biased too, in the left-wing technocommunist direction.

  • by headkase ( 533448 ) on Wednesday February 23, 2005 @09:12PM (#11761987)
    Using statistical methods to predict the next item in a sequence is still not true hard ai though, this technique is used with the voice recognition software "Dragon Natually Speaking" creating in effect pattern chains. What Dragon did on the character level this software appears to do on the word level. This is still not true AI however, as the statistics will only map to probabilistic sequences not abstractly map instead to the concepts. What would really impress me is if they came up with a mapping algorithm that instead of using probability used a function like mini-max fitness testing on a neural-network substrate.
    It would be interesting to see the results of analysing large sections of languages however, but the only immediate use I can fathom for this would be for cryptography or information compression algorithms. However the results could probably be used to provide insight into how languages evolve or how memes spread from language to language.
    Or the brief explanation in the article did not make it clear enough how this differs from what was previously state-of-the-art, e.g. Dragon.
  • by Secret Agent 99 ( 855215 ) on Wednesday February 23, 2005 @09:17PM (#11762020)
    ...and fruit flies like a banana.

    When an automated translator can handle that one without bursting into flames, I'll start to believe.
    • If it has seen it translated by a human somewhere else before and remembers it... why should it have a problem?

      Or are you expecting a computer to solve a problem even a human can't handle?
      • Or are you expecting a computer to solve a problem even a human can't handle?

        I can handle this problem easily, but would be impressed if you found a computer program that would do the same.

        Time flies like an arrow and fruit flies like a banana.

        See, this sentence is grammatically correct in two ways, but one of them is not logical.

        a computer would likely produce this:
        TIME (flies) (as) AN ARROW and FRUIT (flies) (as) A BANANA.

        when you really want:
        TIME (flies) (as) AN ARROW and FRUIT FLIES (favor) A BA

        • Well until I read your explanation the sentence didn't seem logical to me. I assure you, I am not an automaton.

          We have this thing in english called parallelism. It is expected that when you use it you do so in a way that aids the reader's understanding.

          Furthermore, a more grammatically correct way (and for more readable) to say that would be "fruit flies like bananas", or better yet, given the ambiguity you point out, "fruit flies enjoy bananas".

          I would love for you to provide a better example though -
          • The sentence is correct in English, and the words have counterparts in the other language (I would assume). It's possible to translate it. A human translator will probably do it correctly. Whether or not it's the least confusing way to say something is unimportant. If you're limiting what you can say for the ease of computer-based translation, then we might as well all learn some intermediate language with no ambiguity that a computer can translate trivially, 'cause that theoretical translator ain't goo
            • English is context-sensitive. Inherent ambiguity in the language is dealt with through context. Translating something that is intentionally ackward and misleading is a terrible test. As another poster pointed out, while the sentence is gramatically passable by the written rules of english, no english speaker would say that. For one, if the two concepts are not connected it is a run-on sentence. For another, fruit flies in this context are a broad category, whereas A banana is a singular entity. A broa
          • I agree, the sentence is deceptive, but not necessarily wrong. This is true for many witty remarks.

            I would love for you to provide a better example though - and although they doubtlessly exist, I've no doubt they can be accomodated for.

            No doubt, modifying English to accomodate a computer is easily done. However, that isn't the the trick we're going for. We're trying for a computer program that can parse any English, like the original post in this thread.
        • Actually, a statistical system will get this right. It will find that the pattern "fruit flies" is common as a noun phrase, while "time flies" is rare as a noun phrase. It will also find that "time flies" is a common complete sentence, suggesting that "like an arrow" is an adjuct to it. "fruit flies" is rare as a complete sentence, suggesting that "like a banana" is not an adjuct, and must be a verb and direct object.

          Actually, at this point a statistical system based on an automatically collected corpus is
    • I just babelfished it into German. The answer seems to be correct.
    • "Time flies like an arrow" is a simile, and is idiomatic. There are a finite set of idioms, and they should be fine as "memorized" exceptions in a speech system (they are often memorized exceptions in humans). Most language is rule based, but I think many underestimate the number of idioms that humans encounter and have difficulty "parsing."

      "Time flies like an arrow, fruit flies like a banana" is a joke. Translating it into other languages would neither be funny or especially meaningful, as the whole po
  • by Anonymous Coward
    I hope they don't read everything. Next thing you know translations could end up L1k3 th1s f0R 4l1 y0u K|\|0\/\/.
  • by Anonymous Coward on Wednesday February 23, 2005 @09:43PM (#11762154)
    The basic approach has been developed over 10
    years ago by IBM: The Mathematics of Statistical Machine Translation [upenn.edu]. And even free software has been available for a while, see
    http://www.fjoch.com/GIZA++.html [fjoch.com].
  • by gkwok ( 773963 ) on Wednesday February 23, 2005 @09:47PM (#11762176) Homepage
    Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14am Eastern Time....
  • No samples? (Score:4, Interesting)

    by Guspaz ( 556486 ) on Wednesday February 23, 2005 @09:47PM (#11762177)
    Sounds interesting, but I couldn't find a single sample translation on their site; ie a block of text in language A (Say, french), and language B (Say, english). Translated from A to B by their software.

    Without even the simplest of examples or samples we have only their word on how well this works.
  • DOOMED (Score:3, Interesting)

    by FoXDie ( 853291 ) on Wednesday February 23, 2005 @09:48PM (#11762181) Homepage
    Recently robots have been made that can Run, Wield shotguns, and Recognize faces. Now they can read. [DOOMED I SAY]
  • by Timbotronic ( 717458 ) on Wednesday February 23, 2005 @09:54PM (#11762221)
    I like the approach they've taken, but machine translation can only ever go so far.

    A friend of mine was trying to translate an English novel into German a while back. She had to work out a replacement for a sentance where the word 'therapist' was construed as 'the rapist'. Hell of a job and she's a professional translator.

    Automatic translation looks pretty good for technical documents, news and anything completely literal. When you get writing with double meanings, humour and plays on words it gets way harder - often to the point where there is no correct translation.

  • While I am all for coming closer to a universal translator of sorts, I can't imagine any software outside of AI being able to pull it off.

    As a person who speaks both English and Japanese, I can't believe that anyone could ever come up with an algorithm to translate between these languages. So much of it is context and nuance based, not to mention that there are words in the languages that simply do not exist in the other language so the only way to really understand it and make an attempt to translate is

  • Tests (Score:3, Interesting)

    by headkase ( 533448 ) on Wednesday February 23, 2005 @10:06PM (#11762287)
    The biggest test of the translator is converting from one language to another and then back again multiple times. If the content doesn't get corrupted then it works as advertised.
  • Is this anything like the digestion for understanding (and subsequent output from) applied to christmas music [slashdot.org]? If so, they'll need a lot of work...
  • The critical issue is not whether this system will produce translations comparable to those done by a translator fluent in both languages -- it won't. However, it may do as well or better than translations by someone barely competent in one of the languages (or who is essentially just doing dictionary-based translation). English speakers have lots of examples of nearly incomprehensible technical translations from Chinese and Korean, and the Chinese and Koreans would probably have comparable examples of bad
  • The general technique of feeding translations into a machine translation system and letting it derive its own rules started many years ago at CMU's Language Technologies Institute [cmu.edu]. It was called Example-Based Machine Translation, or EBMT [cmu.edu].

    EBMT never really worked very well (it needed millions of translations before it'd start to yield anything useful, and even then it needed hand-holding), but perhaps these new researchers have taken it to the next step.

  • by Anne Thwacks ( 531696 ) on Thursday February 24, 2005 @04:20AM (#11764658)
    Input...Need more Input
  • by Dulimano ( 686806 ) on Thursday February 24, 2005 @08:02AM (#11765350)
    This is news of '93, when Brown et al. at IBM built their famous statistical machine translation system. It does exactly what is described in the article. I myself work on such a system (for Hungarian-to-English translation).

    The article (press release?) is totally misleading. Kevin Knight and Daniel Marcu are building on at least 15 years of active research on statistical machine translation. On the other hand, they are really very good at it.

C for yourself.

Working...