AI Goes Bilingual -- Without a Dictionary (sciencemag.org) 99
sciencehabit shares a report from Science Magazine: Automatic language translation has come a long way, thanks to neural networks -- computer algorithms that take inspiration from the human brain. But training such networks requires an enormous amount of data: millions of sentence-by-sentence translations to demonstrate how a human would do it. Now, two new papers show that neural networks can learn to translate with no parallel texts -- a surprising advance that could make documents in many languages more accessible.
The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary. The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.
The two new papers, both of which have been submitted to next year's International Conference on Learning Representations but have not been peer reviewed, focus on another method: unsupervised machine learning. To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That's possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voila! You have a bilingual dictionary. The studies -- "Unsupervised Machine Translation Using Monolingual Corpora Only" and "Unsupervised Neural Machine Translation" -- were both submitted to the e-print archive arXiv.org.
Re: (Score:2)
Re: (Score:1)
Al Gore Bilingual. -- Without a Dictionary?
Re: (Score:2)
Re: (Score:1)
Totally legal now. Unlike when you were 30 and they were 8.
Not Peer Reviewed (Score:1)
Yet published on Slashdot because it centers around a buzzword.
No, it does not (Score:4, Insightful)
In order to go "bilingual", it would have to be able to understand one language first. However understanding natural language is so far beyond the demented automation ("weak AI") available today, it is not even funny anymore. May as well claim a squirrel is a "gourmet chef", because it can bury nuts, i.e. "process food". Whether actual intelligence is going to be available on machines, ever, is at this time completely unknown, because nobody knows what it is. It is pretty clear though that the only natural computing hardware known (the human brain) is not powerful enough to create the intelligence observable at the interface of the smartest instances, at least if any known computing paradigm is assumed to be how it works. So either a completely computing paradigm is needed (and no, "neural" nets will not cut it and they are really old), or the problem is even more complicated.
The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit. Just look at who people vote for.
Re:No, it does not (Score:5, Insightful)
In order to go "bilingual" ...
The headline says "bilingual". Neither paper uses that term.
it would have to be able to understand one language first.
It is not clear if this is true. Translation accuracy has greatly improved, and is continuing to improve, despite the NNs having no understanding of how the languages map to reality. They only learn how the languages map to each other.
"neural" nets will not cut it and they are really old
What does age have to do with anything? Biological neural nets have been around for 600 million years.
Re: (Score:3)
age is relevant because the concept and theory involved was already studied to exhaustion.
Not true at all. Backprop dates back to 1986. Autoencoding was introduced in 2006. GANs were first used in 2014. Perhaps even more importantly, fast parallel computing with cheap GPUs and mountains of training data were only recently available.
Re: (Score:2)
This is even more hilarious than that: Hinton has basically said that his methods from 1986 would have proved out on a practical basis if only the machines and data had been beefier at the time. Some of the recent improvements are nice, but he doesn't view them as essential.
Oracle: Flight has been beaten to death since da Vinci.
Wilbur: You'd be amazed how much wind tunnels have improved since the invention of the stea
Re: (Score:1)
The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit
This definitely applies to comments on Slashdot, where "dressed up prettily"= scare quotes and overconfidence with a sprinkling of jargon.
Re: (Score:1)
That explains a lot about the 2016 election in the U.S.
Google Translate? (Score:4, Interesting)
In order to go "bilingual", it would have to be able to understand one language first.
Google translate can map between multiple languages without understanding any of them...which, admittedly, is why it does not do a great job but it is usually good enough to be reasonably understandable.
Comment removed (Score:5, Interesting)
Re: (Score:2)
I completely agree with this. Languages in Asia (especially South East Asia countries) have different language root compared to western languages. Culturally, the way people use the language, even in written style which is more formal and/or complete sentence, is different from the westerns. It is even worse in speaking language style because often times people don't exactly follow the language grammars but still understandable among them.
Another point is that these languages usually have their own politene
Re: (Score:2)
Have you tried it recently? Their old phrase based translations were terrible for Asian languages. Ask it to translate Japanese into English and you'd get garbage. Then they rolled out their new system based on neural networks, and it suddenly got a lot better. Not perfect, but now you can tell what it's saying. It's always easier translating between closely related languages, but the NNs are surprisingly good even for distant ones.
Re: (Score:3)
Like hell.
'eff you. /s
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
It can map between words and sentences. It cannot map between languages. It has no grasp of semantics.
Re: (Score:2)
May as well claim a squirrel is a "gourmet chef", because it can bury nuts, i.e. "process food".
Or similarly a rat, because it can control a human in a kitchen by pulling on its hair -- possibly with some assistance from the food processor in your example.
Re: (Score:2)
Whether actual intelligence is going to be available on machines, ever, is at this time completely unknown, because nobody knows what it is.
We got human level intelligence from old monkey brains by just fucking around for 100,000 generations.
Re: (Score:2)
That is actually unknown. Physicalism is a belief, not science. Actual science find the questions of intelligence and consciousness are currently getting more mysterious, not less so, as more data and facts become known.
Re: (Score:2)
That is actually unknown.
For you, maybe.
Re: (Score:2)
Indeed. I, unlike you, am an actual scientist.
Re: (Score:1)
What actual science exists that has anything to say on anything outside of the physical world?
Understanding (Score:4, Informative)
"Understanding" has multiple level.
Even you, dear snowflake, don't have the level of understanding a language that a reknown writer and poet could have of its intricacies.
Or, you only have a vague grasp of some concepts in a field of work outside of yours, whereas some body expert in the field has a much better understanding.
Even the pets (cats, dogs) in your house can have some basic understanding of things around, even if they don't think in such abstract concepts as you.
This software, due to the way it's build (basically word2vec and deep neural net), has some very basic form of understanding the language.
It's a very simple artificial brain, that is entirely optimised for one specific subdomain (language) and thus completely lacks other forms of thinking (cannot dissert about a scientific article written in said language).
But the way this system works, is that is able to implicitly and autonomously build relationships between things.
The kind of knowledge built into some ontology databases, except that here, the knowledge isn't manually constructed by the scientist filling the database, the knowledge is discovered on the go, not unlike how very young babies would discover the world around them.
Okay, it's a very stupid and limited baby in this case, but still.
It's good enough to catch and understand links between concepts.
Re: (Score:3)
While that not reliably known at this time, it very much looks like it. In particular, for basically everything that you can do with technology, things start to make more sense the more you know when you get remotely in the area where you can think about actually doing it. Details get more complex, but the general working of a thing is understood at that time. With consciousness and intelligence, it is currently the other way round. We have absolutely no clue how they are generated, whether they are generat
Re: (Score:2)
That means either they cannot be created with technology, or we are very very far from being able to do so.
Or maybe it just means they're bogus concepts.
People have argued for centuries about what "consciousness" and "intelligence" mean, and they still can't agree. So engineers roll their eyes, turn their backs on the argument, and get on with the job of creating useful things. And then people say, "It's not really intelligent! It's not really conscious!" Well, who's to say? If you can't define what the words mean, it's impossible to decide whether AIs meet the definitions. So give us some rigorous definit
Re: (Score:2)
Indeed. All we have is claims by people to have consciousness. We have absolutely no clue what it is, yet it seems every reasonably functional human being finds it has it, or at least claims so. At the same time, there is no mechanism for consciousness in known Physics. For example, pseudo-profound bullshit like "consciousness is an illusion" is circular because an illusion needs consciousness. At the very least we need a fundamental extension of physics to accommodate that, but, as in Physics matter, energ
Re: (Score:2)
And fail. (Well, what do you expect from a cretin that calls people "snowflake" without any good reason...)
Even a smarter pet (a dog, for example) has some understanding and model of the real world and can map language to that model and can make (limited) predictions because it feels like it. An artificial neural net has nothing like that. It just has statistical classification and that is not enough for a world-model of even the most simple type, regardless of how "deep" you make it.
Re: (Score:1)
Yes, but it's still being done by a computer... so it will never be "Real A.I. (tm)"
In 10 years from now when we are composing A.I.'s out of multiple A.I.'s like this one, it still won't be "real A.I. (tm)" even if it can completely replace 38% of human workers leaving them unemployable because they are not smart enough or lack the willpower to outperform "Not Real A.I.'s (tm)" even with additional- completely free- training.
Right now, today with "not real A.I.'s" we are looking at 38% of jobs going away i
Re: (Score:2)
And fail. Have a look into the research literature at some time. If what you claim were true, we would have high-quality automated translation decades ago. Not cheap, but it would have been done and it would have had tons of applications in military and intelligence use were the money would have been available.
Re: (Score:2)
The real problem here is that most people are not smart enough to recognize a moron if the moron is dressed up prettily and spews pseudo-profound bullshit.
Oh, I think I've just spotted one..
Re: (Score:1)
Written down? That doesn't sound like what Muhammed intended...
My hovercraft is full of eels. (Score:3)
n/t
I don't think so, tim (Score:2)
I've been learning japanese for about 2 years, using SRS and reading. I can tell you these systems will be great for instructions on assembling a desk, or how to check your oil. Totally useless for storytelling. Anything containing references, jokes, wordplay, hell even pronouns where english just doesn't have as many will always be compromises.
Re:I don't think so, tim (Score:5, Insightful)
That would be fine. The number of times I wanted a machine translated story in the past... I dunno, ever. 0. The number of times I wanted a technical paper, or instructions or tech specs are significant. Or even news. Storytelling, jokes and wordplay are the least interesting thing to translate, because there are people who actually already do that.
Re: (Score:1)
Re: (Score:1)
That would be fine. The number of times I wanted a machine translated story in the past... I dunno, ever. 0.
I guess it just means you are a uninteresting person with no taste or desire to know other cultures.
Re: (Score:2)
Well if you mean stories like novels not news stories, I agree. For any language the nuances and particularities will be lost in translation, even in human translations they sometimes have to explain some untranslateable words or concepts in a footnote. But I think they could do a lot better translating articles and blogs about subjects that address a broad audience and speak rather plainly in the native language. Often it still ends up being very awkward Yoda-isms and strange or incorrect choices of words,
Re: (Score:2)
I would not expect it to work for idioms or anything where languages developed different concepts to describe things. It won't understand an Eskimo that talks about snow (they have
A cool idea, but that's how you get things. (Score:2)
Re: (Score:1)
I don't see the difference.
Re: (Score:3)
Jedi have light sabres.
Still Requires Data (Score:5, Insightful)
These are very cool advances, but they don't solve the major problem of machine learning (ML): Having lots of data.
While these approaches don't need bilingual corpora, they still need big monolingual corpora. Very few languages have those, and those that do usually also have bilingual corpora to one or more of the major world languages.
This does lower the barrier to entry significantly for those doing ML machine translation. But, if one took the resources spent on gathering and curating corpora and instead invested in rule-based systems, you could get much further in less time.
Re:Still Requires Data (Score:4, Informative)
Depends what you mean by "lots of data".
This weakly supervised stuff is especially nice for NLP, since there are almost no large, general bilingual corpa. A few exist, but they're often the result of some legalistic process, so they cover something of a subset of language.
There are a lot more languages with a lot of written text than there are language paired with large amounts of correlated texted.
Also do you have any reason to think that rule based systems world be better? A huge amount of work went into those in the past, and their capabilities seem tapped out. The other thing is what you mean by "much further". The point of this paper seems to me to push the bar on weakly supervised learning, rather than to get the best translation software ever.
Very weakly supervised learning can do all sorts of cool things. See for example cyclegan the zebrifier (it turns pictures of horses into pictures of zebras).
Re: (Score:2)
While these approaches don't need bilingual corpora, they still need big monolingual corpora.
Except that we have terabytes of unstructured and unlabeled monolingual text. You could train it on Wikipedia pages. In fact, there is an entire library of congress of data in ... the library of congress.
Re: (Score:2)
But, if one took the resources spent on gathering and curating corpora and instead invested in rule-based systems, you could get much further in less time.
Really? Why do you think that? Rule based is how all machine translation systems worked until just a few years ago. They worked, but not that great. And that's after decades of optimizing. Then the NMT systems came out and blew them out of the water.
And building a monolingual corpus is pretty easy. Have a shelf of books written in that language? Great, scan them in. Maybe there's a newspaper with an archive of back issues. There you go, you're set. Way easier than a bilingual corpus, where someone
Can it decipher the Indus Valley script (Score:2, Interesting)
Can it translate Linear A? Cretan heiroglyphic?
Re: (Score:3)
That is what I was wondering. I'm betting the answer is "no". When you have very limited source material, and the correct translation of the source material is probably long lists of items like "3rd year, Nowhereville, 5 bushels wheat" I doubt this approach would get you anywhere.
In every case which I am aware of, (hieroglyphs, Linear B, Mayan) decypherment of ancient scripts required that a close relative of the script language was known to the decypherers. (If anyone has counter examples, I'd love to know
Re: (Score:1)
It wasn't until the discovery of the Rosetta Stone, that they were able to decipher Ancient Egyptian with confidence. They could guess what the symbols and glyphs meant but until there was some anchor point with all languages, they couldn't say for certain.
Re: (Score:2)
In every case which I am aware of, (hieroglyphs, Linear B, Mayan) decypherment of ancient scripts required that a close relative of the script language was known to the decypherers. (If anyone has counter examples, I'd love to know about them.) If the language of the script is completely extinct, we may never be able to decypher it.
Sumerian: Language isolate. Decyphered through Akkadian (Semitic language, related to modern Arabic and Hebrew) because both languages used the same cuneiform script which is (mostly) phonetic in nature.
Etruscan: Believed to be part of the extinct Tyrsenian language family. Decyphered through Latin and Greek (both Indo-European languages) because Etruscan alphabet is the intermediate step between Greek and Latin alphabets.
You don't need a related language, you only need some reference point for the phonol
Re: (Score:2)
Thank you
I call 'fake news' (Score:5, Insightful)
The assumption, that the world is the same, and languages are attached to it, lies at the bottom of the idea of this learning strategy. The example given - of 'table and chairs' demonstrates this. Most of these ideas belong to a 19th century eurocentric understanding of the world we live in. Modern neuroscience and other work points to the fact that the world we perceive is very much dominated by the language we use, and not the other way around.
Concrete Example: For a large portion of the 19th-20th Century many Greeks measured distance in cigarettes - how many cigarettes I will smoke while travelling from one place to another. There is no cognate in English for this. Not only that, but the language usage indicates a specific timespan as well as cultural differences.
"Idiom!" I hear you say. Consider cultures where there are many more tables than there are chairs - such as in Asia where most people sit on the floor or on cushions.
"But there are some universals - we can still use those!" - generally, there are no universals, or so few that they are not worth talking about. Talk to an anthropologist about it. Not even the concept of 'mother' is a universal.
Re: (Score:1)
Said someone who's probably never tried to create new knowledge. What you say is that it's not perfect. Indeed, the accuracy is much lower than the best attempt that has good data to learn from. But sure it's a new result, and something that can be useful.
Re: (Score:2)
Everything you describe sounds like a feature to me, not a bug. Such a system would not only translate language, but culture.
For common speech, this is an incredible advancement. Sure, you'll run into trouble when you specifically want a chair and the local custom is to sit on cushions... but when you're asking which 'chair' to sit on it'll work just fine and you'll figure it out when you're about to sit.
Al who? (Score:1)
Who is Al, and why does it matter if he's bilingual?
#serifisimportant
simple thermodynamics (Score:2)
Anyone who understands that there was a lot more to Bletchley Park than rotor combinatorics can't honestly say they find this result surprising.
Especially when the languages chosen have a shocked degree of family resemblence.
No word for "I" or "me" or "mine" [upenn.edu]
Universal Translator (Score:2)