Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Google Databases Music Network Software Science

Google's DeepMind Develops New Speech Synthesis AI Algorithm Called WaveNet (qz.com) 46

Artem Tashkinov writes: Researchers behind Google's DeepMind company have been creating AI algorithms which could hardly be applied in real life aside from pure entertainment purposes -- the Go game being the most recent example. However, their most recent development, a speech synthesis AI algorithm called WaveNet, beats the two existing methods of generating human speech by a long shot -- at least 50% by Google's own estimates. The only problem with this new approach is that it's very computationally expensive. The results are even more impressive considering the fact that WaveNet can easily learn different voices and generate artificial breaths, mouth movements, intonation and other features of human speech. It can also be easily trained to generate any voice using a very small sample database. Quartz has a voice demo of Google's current method in its report, which uses recurrent neural networks, and WaveNet's method, which "uses convolutional neural networks, where previously generated data is considered when producing the next bit of information." The report adds, "Researchers also found that if they fed the algorithm classical music instead of speech, the algorithm would compose its own songs."
This discussion has been archived. No new comments can be posted.

Google's DeepMind Develops New Speech Synthesis AI Algorithm Called WaveNet

Comments Filter:
  • while more natural speech synthesis maybe useful to create an illusion of intelligence, speech synthesis and so called "artificial intelligence", are too different things.
    even more relevant learning to mimic speech and word use of others, is just another way of using "artificial intelligence".

    • I two like to put scare quotes around the so called "artificial intelligence"
    • by ShanghaiBill ( 739463 ) on Friday September 09, 2016 @06:29PM (#52858997)

      speech synthesis and so called "artificial intelligence", are too different things.

      Accurate speech synthesis, with appropriate pronunciation and intonation, is absolutely a subset of AI. There is no way to do it with a dumb algorithm, such as a lookup table. No one has done it without machine learning.

      • Accurate speech synthesis, ..., is absolutely a subset of AI. There is no way to do it with a dumb algorithm, such as a lookup table. No one has done it without machine learning.

        1st, no one has done it, period. even this story do not claim 'accuracy'.

        2nd, method involved here is in fact 'a dumb algorithm'. nor is there any inherent logical reason why speech synthesis cannot be done by an algorithm. mere assertion that it cannot be done is not an acceptable reason.

        3rd, so far so called "artificial intelligence","machine learning", are dumb algorithms. sorry to burst your bubble.

        -
        btw all that does not take away from my original point, that creating so called "artificial intelligence

        • "3rd, so far so called "artificial intelligence","machine learning", are dumb algorithms." - Sure, but so are the algorithms that we use in our brains for object recognition and speech synthesis etc. If we ignore consciousness for a minute, all of this stuff is just very complex function approximation, whether google does it or your neurons do it.
          • "3rd, so far so called "artificial intelligence","machine learning", are dumb algorithms." - Sure, but so are the algorithms that we use in our brains for object recognition and speech synthesis etc. If we ignore consciousness for a minute, all of this stuff is just very complex function approximation, whether google does it or your neurons do it.

            that is a mere assumption about our brains.
            we actually do not know how our brains/neurons work for most part. so it is a big jump, an unscientific one, to think they work that same way as google's algorithms do.

  • Did they confuse recurrent neural networks and convolutional neural networks when discussing the old versus new method of speech synthesis?
  • Flouting ./ tradition, I actually listened to Deepmind's examples of their voice. They're rather unimpressive compared to the other two voice samples they compare themselves to, and very noisy. I heard much better from IBM Watson four years ago.

    Methinks Deepmind published too soon.

    • I thought they sounded generally better, except one sample that used two numbers at the end of a sentence... in that case it seemed like it pronounced them very un-naturally. More of an incremental improvement in synthesis and it also required the sentences all be parsed out pretty careful going in.

    • This link https://text-to-speech-demo.my... [mybluemix.net] allows you to experiment with the Watson version directly for anyone who is interested.
      • Nah, I still prefer Alex from Mac OS. The IBM voice is smooth but unnatural in intonation, even in the example where they marked intonation on the text. I really loved the DeepMind samples, but they come at 1 second of speech generated in 90 minutes of computation, so, no chance of having that voice on my laptop.
  • I'll never understand why Slashdot likes to link to poorly written and misleading summaries, when the original blog post is so much more readable and informative. I suggest everybody skip the "Quartz" article and instead read the original blog post. Thankfully, for once it was in fact included in the Slashdot summary, even if it was downplayed: https://deepmind.com/blog/wave... [deepmind.com]

    • Two possibilities: The same reason that Wikipedia wants secondary sources instead of primary. Less biased is supposedly more accurate. Two: The submitter submits the story frow where he usually sources his news and that's what they go with. My personal experience sumitting stories and looking at stoy submissions suggests it is usually two.
  • While the individual words are are better... the sentence pacing is not.

    This is similar to the "singing computer" pronunciation, many years ago, in which the ACM distributed CD's with the tracks on it.

    You don't get the stilted words, but unless it's intentionally paced (for example, a real human would have put a pause before "directed"), it's still going to be recognizably artificial -- but worse than that: difficult for a human expecting the pacing to understand.

    Given that age related hearing loss tends to

  • by K. S. Kyosuke ( 729550 ) on Friday September 09, 2016 @08:08PM (#52859567)
    It will be great when games will be able to use non-pre-recorded speech for dialogs. No need to have characters express just two or three different game states with one recording each.
  • . . . Mycroft is on the line.

  • by Guspaz ( 556486 ) on Friday September 09, 2016 @10:15PM (#52860257)

    The word is that Star Trek: Discovery may attempt to use Majel Barrett's voice for the computer, due to her having recorded a complete phonetic sample before she passed. If this really does outperform the best available TTS engines, then perhaps DeepMind would be a good fit to generate that for the show: since it's supposed to be a computer, it's not the end of the world if it doesn't sound completely human...

I tell them to turn to the study of mathematics, for it is only there that they might escape the lusts of the flesh. -- Thomas Mann, "The Magic Mountain"

Working...