Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Microsoft AI Software Science Technology

Microsoft Announces Breakthrough In Chinese-To-English Machine Translation (techcrunch.com) 72

A team of Microsoft researchers announced on Wednesday they've created the first machine translation system that's capable of translating news articles from Chinese to English with the same accuracy as a person. "The company says it's tested the system repeatedly on a sample of around 2,000 sentences from various online newspapers, comparing the result to a person's translation in the process -- and even hiring outside bilingual language consultants to further verify the machine's accuracy," reports TechCrunch. From the report: The sample set, called newstest2017, was released just last fall at the research conference WMT17. Deep neural networks, a method of training A.I. systems, allowed the researchers to create more fluent and natural-sounding translations that take into account broader context that the prior approaches, called statistical machine translation. Microsoft's researchers also added their own training methods to the system to improve its accuracy -- things they equate to how people go over their own work time and again to make sure it's right.

The researchers said they used methods including dual learning for fact-checking translations; deliberation networks, to repeat translations and refine them; and new techniques like joint training, to iteratively boost English-to-Chinese and Chinese-to-English translation systems; and agreement regularization, which can generate translations by reading sentences both left-to-right and right-to-left. Zhou said the techniques used to achieve the milestone won't be limited to machine translations. The researchers caution the system has not yet been tested on real-time news stories, and there are other challenges that still lie ahead before the technology could be commercialized into Microsoft's products.
You can play around with the new translation system here.
This discussion has been archived. No new comments can be posted.

Microsoft Announces Breakthrough In Chinese-To-English Machine Translation

Comments Filter:
  • Cause that is all it takes.

    • Not sure what you mean by this. That was their test set, not their training (nor dev) set. I don't think that's an unreasonable amount of data for a test set.

      • by Potor ( 658520 )
        Surely everyone on /. knows that this is an (apocryphal) joke [wired.com].
        • Oh. Yes, I've heard that joke, but I didn't get the connection... thanks for pointing it out.

      • Re: (Score:3, Interesting)

        by Excelcia ( 906188 )

        It's not reasonable as a test set if it's chosen to be stuff that's easy to translate. I just tried some Chinese assembly instructions, and it's terrible. And I don't mean the technical stuff, I mean the introduction:

        Original:
        (SNIP - the chinese characters won't work here. Alas for unicode.)

        Microsoft:
        Structure assembly, according to a certain order, the relevance of parts to subcontract; a total of eight subcontracts, from A1 to A8, the same package of parts are related, after assembly will constitute a

  • by Anonymous Coward

    For years I've managed to deal with co-workers in Asia who don't speak English and simply use an online translator to craft the emails they send to me. Sure it can be a bit awkward sometimes but as long I keep my response simple everything works out well. On the rare occasion when it doesn't work they simply add someone to the conversation who speaks both English and their native language.

  • by Anonymous Coward

    I can assure you Chinese to English translation to the same accuracy of THIS (me) person is nothing to brag about!

  • by fahrbot-bot ( 874524 ) on Wednesday March 14, 2018 @05:32PM (#56262033)

    Can it translate a Chinese Reporter's "eye-roll"? 'Cause one apparently broke China's Internet [nytimes.com]

    With a fellow reporter’s fawning question to a Chinese official pushing past the 30-second mark, Liang Xiangyi, of the financial news site Yicai, began scoffing to herself. Then she turned to scrutinize the questioner in disbelief.

    Looking her up and down, Ms. Liang rolled her eyes with such concentrated disgust, it seemed only natural that her entire head followed her eyes backward as she looked away in revulsion.

    Captured by China’s national news broadcaster, CCTV, the moment spread quickly across Chinese social media.

    ...

    On Chinese social media, GIFs and other online riffs inspired by Ms. Liang’s epic eye roll quickly proliferated, and by evening they were being deleted by government censors. Ms. Liang’s name became the most-censored term on Weibo, the microblogging platform. On Taobao, the freewheeling online marketplace, vendors began selling T-shirts and cellphone cases bearing her image.

  • by Anonymous Coward

    https://www.google.com/search?tbm=isch&q=translation+server+error+chinese+sign&oq=translation+server

  • and even hiring outside bilingual language consultants to further verify the machine's accuracy

    Only natives or bilinguals (if they really are) can verify the translaion's accuracy -- until you get your neural networks trained up to that point, of course.

    • Most so-called bilinguals aren't.

    • Re: (Score:2, Insightful)

      by rtb61 ( 674572 )

      Actually bi-cultural bi-linguals. There are differences in culture which drive the different expressions and translations. Auto-translation is of course a very important tool in global human discourse. The problem, well, the less informed, the less educated, those with far less understanding, will be readily able to communicate with each across the language barrier, think say American Rednecks and Chinese Rednecks, screaming at each other about how their armies can destroy each other and flooding other part

    • It's pretty obvious to an English native speaker when a translation is gibberish. A native English-only speaker can't really affirm accuracy, as you stated, but could certainly tell when something is blatantly wrong. They could also at least judge the quality of the final translation's English.

      Generally speaking, most translation programs do really horribly at translating idioms, or context-sensitive but otherwise ambiguous phrases. I'd think this is a perfect application for deep learning algorithms to

      • It seems to me that most translations are far too literal, and this is the challenge: translating idioms and other particular expressions from the vernacular.

        Itâ(TM)s interesting seeing where the focus is going with translations. Iâ(TM)ve worked for years with teams in Germany, Russia and China, and even speak a little German myself. Google translateâ(TM)s Chinese to English beats Russian to English hands down, yet Slavic languages are common across a swathe of Europe and surely closer to

        • If Chinese is well written it is fairly formulaic and rigid, following an STPVO format. They are talking about news articles, not TV dramas. I've tested a few phrases and usually one of the two seems to be an okay translation, where the other may be far off. I tried phrases like, "When I took my dog for a walk today I found a new Japanese restaurant. The prices seemed reasonable and affordable. The taste was pretty authentic." Which it seemed to do well with. The phrase literally translated from Chinese wou

  • by swell ( 195815 ) <jabberwock@poetic.com> on Wednesday March 14, 2018 @05:38PM (#56262071)

    TFS is missing the important test of accuracy: translate Chinese > English, then back to Chinese. Will any Chinese person be able to understand it? Go back and forth twice for a more serious serious test. If you can't get access to Microsoft's software you can easily try this test with existing software. The results can be comical if your business doesn't depend on accuracy.

    • The summary is also faulty, it provides a link and says you can test the tool there, but following the link it says no, it is not the same tool at all, it is a worse tool that is also slower.

      I guess we can assume that whatever translation tool the editors are using to write the stories, it was unable to round-trip this story!

      • That's good because the tool found in that link has trouble with idioms and also has trouble distinguishing want/will/can. Almost every sentence I tried had problems, sometimes simple problems that shouldn't even be hard.
    • Years ago I tried this out of curiosity. It would typically only take a single round trip to start being funny. After several round trips, you could barely tell what the original topic was. It's like a computerized version of the game "telephone."

    • That won't work.

      Translating asian languages into english is in general super easy. Translating english or german into a asian language is much harder.

    • I think you'll find that's exactly what they're doing here some sample tests ...

      http://matrix.statmt.org/matri... [statmt.org]

      So you read the article and the linked papers and came to this conclusion? Or did you just skip all of that and go straight to trolling?

    • I think you'll find that's exactly how they're testing it... Here's some sample test data...

      http://matrix.statmt.org/matri... [statmt.org]

      It's not perfect but it actually doesn't look too bad.

      I'm just wondering how you reached this conclusion? Did you read the article and the linked papers? Or did you just skip all of that and go straight to trolling?

    • I think you'll find that's exactly how they're testing it... Here's some sample test data...

      http://matrix.statmt.org/matri... [statmt.org]

      It's not perfect, but it actually looks pretty good.

      I'm just wondering how you reached this conclusion? Did you read the article and the linked papers? Or did you just skip all of that and go straight to trolling?

  • by Zorro ( 15797 )

    Othig to see here..........

  • by marciot ( 598356 ) on Wednesday March 14, 2018 @05:46PM (#56262109)

    I heard a story about an engineering company who used automatic translation to send documents back and forth with their international collaborators. At one point, their engineers were perplexed by the frequent mention of an âoewater goatâ in their correspondence.

    After digging through their source documents, they learned that the water goats were in fact hydraulic rams.

    • by wonkey_monkey ( 2592601 ) on Wednesday March 14, 2018 @06:19PM (#56262305) Homepage

      by the frequent mention of an âoewater goatâ in their correspondence.

      I'm still perplexed by the frequent mention of "âoe" and "â" on Slashdot.

    • by AmiMoJo ( 196126 )

      Not sure how useful translating back and forth is as a test. I use Chinese/English translation extensively and never need to do it. Any business trying to write a document collaboratively, especially a legal one, won't use machine translation anyway.

      I tested Microsoft's effort. It's not bad. Baidu is also quite good, and Google is okay. Sometimes it helps to try the same phrase in a couple of different ones. Microsoft and Baidu seem to give more natural translations, but Google is better at correcting error

  • ...does it censor the letter 'N' as a real Chinese would do it?

  • by gman003 ( 1693318 ) on Wednesday March 14, 2018 @06:40PM (#56262397)

    I read the MS blog and skimmed the actual paper. It gives a decent overview of the system design but has basically no details on the linguistics side of things. They just hired a bunch of people to do manual translation, both for training and for testing, but the only details of the results are a single table summarizing what categories of errors occurred.

    A lot of relevant information was missing. To start with, saying "Chinese language" is like saying "European language" - there isn't one unified "Chinese", but rather a variety of languages, topolects and dialects, with some level of mutual intelligibility, but it varies considerably. Not all variants use the same writing system - most use Hanzi, but there's the whole Traditional vs. Simplified issue, and some obscure varieties use entirely different systems (eg. Dungan is written using Cyrillic, despite being closer to Mandarin than many Hanzi-using topolects). And secondary writing systems abound - for teaching and for computer usage, both the Latin alphabet and Bopomofo syllabary are used, in the mainland and Taiwan, respectively.

    From context, they seem to be aiming for Mandarin Chinese, the most common variety, and they only accept input in Simplified Hanzi, but they don't make that at all clear from the paper. Was the training corpus exclusively Mandarin, or did it include Cantonese or Hakka or Minnan? Was it entirely Mainstream Mandarin, or were regional dialects like Sichuanese included? The nature of the logographic writing system elides a lot of differences, but I can't see how you could completely ignore the issue. At the very least, I would expect it would be a problem for false negatives in the validation - these are issues for human translators as well. Did they dig deeper into the reported translation issues, and find any were a case of "oh, the news article was written in MSM but quoted someone using Dalian dialect" and then have to figure out whether the human or the machine was more accurate? I didn't read the paper thoroughly but I didn't see any mention at all of any of this crap.

    Anyways, they may or may not have made progress on the AI front. I am even less qualified to judge that than I am the linguistics side of it. But there's so many things *not* discussed in the paper that I can't help but feel like they're overstating their results. Guess I'll have to wait for the language blogs to pick up on it.

    • The answers to most of your questions are available, you just need to follow the references. They didn't discuss those details because it wasn't relevant to the paper's main topic whether they were using Mandarin or Cantonese. Besides, they didn't create the training data. They used an existing data set someone else had created, which is actually a collection of several data sets from different sources. So if you want all the details, here's the data [statmt.org].

  • I was exchanging a number of lengthy emails (about 20 each) with someone with from China. After about 15 messages each I found out that they didn't speak English and they were using translation software. The Chinese software is so good that I didn't even think it was being used.

  • I'll be able to read the signs around Melbourne and Sydney, especially on the property listings in the city!

If you don't have time to do it right, where are you going to find the time to do it over?

Working...