Scientists Train AI To Learn People's Voices, Then Generate Their Faces (livescience.com) 63

Posted by BeauHD on Tuesday June 11, 2019 @09:40PM from the watch-and-be-amazed dept.

JustAnotherOldGuy shares a report from Live Science: An neural network named "Speech2Face" was trained by scientists on millions of educational videos from the internet that showed over 100,000 different people talking. From this dataset, Speech2Face learned associations between vocal cues and certain physical features in a human face, researchers wrote in a new study. The AI then used an audio clip to model a photorealistic face matching the voice, and the results are surprisingly close to the actual faces of the people whose voices it listened to. The faces generated by Speech2Face didn't precisely match the people behind the voices. But the images did usually capture the correct age ranges, ethnicities and genders of the individuals, according to the study. The findings have been published in the preprint journal arXiv but have not been peer-reviewed.

Scientists Train AI To Learn People's Voices, Then Generate Their Faces

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 63 Comments Log In/Create an Account

Comments Filter:

- What does Teller sound like? (Score:2)
  
  by goombah99 ( 560566 ) writes:
  
  Run it backwards and put audio on all the Penn and Teller videos
  - Feed in Mel Blanc (Score:2)
    
    by goombah99 ( 560566 ) writes:
    
    And it will give the AI an acid trip.
- - - Re: In a surprise twist (Score:1)
      
      by Anonymous Coward writes:
      
      If you fund vaccines in poor countries it will save healthcare costs on your own country. If you help save the fish in poor countries it will help also the fish on your coasts. There are many things you can do to others that help also you.
  - Re: (Score:1)
    
    by LifesABeach ( 234436 ) writes:
    
    I was the thinking the algorithm keeps showing a horses ass. Trump 2020.
    - - Re: (Score:2)
        
        by LifesABeach ( 234436 ) writes:
        
        so ah, you never had this happen to you? I'm sorry you're so frustrated.
Cruel and unusual (Score:5, Funny)

by PopeRatzo ( 965947 ) writes: on Tuesday June 11, 2019 @10:28PM (#58748030) Journal

So, the NSA could get an AI to learn my first wife's voice and then use it instead of waterboarding as a form of torture?
Neat!

- Re: (Score:2)
  
  by Oswald McWeany ( 2428506 ) writes:
  
  We've been told, point blank right here on /. that you can't identify any ethnicity by just listening to their voice.
  Have we? I don't think you can do it accurately 100% of the time, but different communities and social groups have distinct accents, and many social groups still divide themselves by ethnicity. If someone of one ethnicity hangs out primarily with a group of people of another ethnicity I think they would fool the AI.
  Voice isn't so much a genetic thing as it is a cultural thing. Your accent and how you speak is very much related to who you associate with and where you grow up. Eventually over generations
  - Re: (Score:2)
    
    by rockmuelle ( 575982 ) writes:
    
    More to the point: there is no scientifically-based definition of ethnicity (or race, which is what I think OP was really referring to). The closet we have are regional and cultural similarities and ancestry. The former are what's in play here (vocalizations) - you can tell what region some is from based on their language, but a white toned and a dark toned person from the same region will sound the same. Ancestry is closer to what most people consider ethnicity, but it has no bearing on how you sound when
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  We've been told, point blank right here on /. that you can't identify any ethnicity by just listening to their voice. Besides that, you simply *cannot* identify someone's gender by voice - why, a basso profundo can identify as a woman, right?
  You might want to read the article before trolling:
  However, the algorithm's interpretations were far from perfect. Speech2Face demonstrated "mixed performance" when confronted with language variations. For example, when the AI listened to an audio clip of an Asian man speaking Chinese, the program produced an image of an Asian face. However, when the same man spoke in English in a different audio clip, the AI generated the face of a white man, the scientists reported.
  
  The algorithm also showed gender bias, associating low-pitched voices with male faces and high-pitched voices with female faces. And because the training dataset represents only educational videos from YouTube, it "does not represent equally the entire world population," the researchers wrote.
Code. (Score:4, Informative)

by RyanFenton ( 230700 ) writes: on Tuesday June 11, 2019 @11:21PM (#58748190)

https://github.com/imatge-upc/... [github.com]
It's in python - not my code, just found it through google.
Ryan Fenton

did usually capture... ethnicities... (Score:2)

by Ambvai ( 1106941 ) writes:

This is like that bit in BlacKkKlansman all over again, when David Duke claims he can identify black people by the way they pronounce 'are'...
Correction (Score:1)

by Anonymous Coward writes:

The findings have been published in the preprint journal arXiv but have not been peer-reviewed.
arXiv is not a journal, and the findings have not yet been published. When people place their manuscript on arXiv, it's usually the same time as they submit it to a peer-reviewed journal. In other words, arXiv is a de facto repository for manuscripts currently undergoing peer-review. It is incorrect to report it as if it's already published, but in a journal that is not peer-reviewed.
Voices, faces, what's next? (Score:1)

by TechyLiss ( 6031688 ) writes:

Scientists train AI not only to recognize voices and faces. According to Bloomberg's latest research [bloomberg.com], Amazon is nor working on the device that will be able to recognize people's emotions and even help build social relations. Sounds impressive, isn't it? However, the question is if we can rely on a machine in building our relations with other people...
- Re: (Score:3)
  
  by sabbede ( 2678435 ) writes:
  
  This isn't that. This is basically using a person's voice to reconstruct their head. Age, sex and facial structure shape a person's voice, and those effects can be measured and used to determine what the person probably looks like.
Race gender and age determination (Score:2)

by JeffSh ( 71237 ) writes:

This looks to me an association based on race and age more than it is a determination of a persons face.
It's AI/ML used to identify a correlation between age race and gender in a persons speech and then it produces a generic output.
what i'd be more interested in seeing is if they run it against 100 speech samples from women in the same region of the same race. let's see how it does then.
And thus the phone sex industry dies (Score:2)

by Too Late for Cool ID ( 1794870 ) writes:

Will the phone sex industry survive clients finding out what the person they've been talking to actually looks like?
- Re: (Score:2)
  
  by Oswald McWeany ( 2428506 ) writes:
  
  Will the phone sex industry survive clients finding out what the person they've been talking to actually looks like?
  I don't think 13 year olds really care what the person they're talking to actually looks like. They could look like a mule with a face plastered with blue waffles and a 13 yo "would hit it".

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Scientists Train AI To Learn People's Voices, Then Generate Their Faces (livescience.com) 63

Scientists Train AI To Learn People's Voices, Then Generate Their Faces More Login

Scientists Train AI To Learn People's Voices, Then Generate Their Faces

What does Teller sound like? (Score:2)

Feed in Mel Blanc (Score:2)

Re: In a surprise twist (Score:1)

Re: (Score:1)

Re: (Score:2)

Cruel and unusual (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Code. (Score:4, Informative)

did usually capture... ethnicities... (Score:2)

Correction (Score:1)

Voices, faces, what's next? (Score:1)

Re: (Score:3)

Race gender and age determination (Score:2)

And thus the phone sex industry dies (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot