Scientists Train AI To Learn People's Voices, Then Generate Their Faces (livescience.com) 63
JustAnotherOldGuy shares a report from Live Science: An neural network named "Speech2Face" was trained by scientists on millions of educational videos from the internet that showed over 100,000 different people talking. From this dataset, Speech2Face learned associations between vocal cues and certain physical features in a human face, researchers wrote in a new study. The AI then used an audio clip to model a photorealistic face matching the voice, and the results are surprisingly close to the actual faces of the people whose voices it listened to. The faces generated by Speech2Face didn't precisely match the people behind the voices. But the images did usually capture the correct age ranges, ethnicities and genders of the individuals, according to the study. The findings have been published in the preprint journal arXiv but have not been peer-reviewed.
What does Teller sound like? (Score:2)
Run it backwards and put audio on all the Penn and Teller videos
Feed in Mel Blanc (Score:2)
And it will give the AI an acid trip.
Re: In a surprise twist (Score:1)
If you fund vaccines in poor countries it will save healthcare costs on your own country. If you help save the fish in poor countries it will help also the fish on your coasts. There are many things you can do to others that help also you.
Re: (Score:1)
Re: (Score:2)
Cruel and unusual (Score:5, Funny)
So, the NSA could get an AI to learn my first wife's voice and then use it instead of waterboarding as a form of torture?
Neat!
Re: (Score:2)
We've been told, point blank right here on /. that you can't identify any ethnicity by just listening to their voice.
Have we? I don't think you can do it accurately 100% of the time, but different communities and social groups have distinct accents, and many social groups still divide themselves by ethnicity. If someone of one ethnicity hangs out primarily with a group of people of another ethnicity I think they would fool the AI.
Voice isn't so much a genetic thing as it is a cultural thing. Your accent and how you speak is very much related to who you associate with and where you grow up. Eventually over generations
Re: (Score:2)
More to the point: there is no scientifically-based definition of ethnicity (or race, which is what I think OP was really referring to). The closet we have are regional and cultural similarities and ancestry. The former are what's in play here (vocalizations) - you can tell what region some is from based on their language, but a white toned and a dark toned person from the same region will sound the same. Ancestry is closer to what most people consider ethnicity, but it has no bearing on how you sound when
Re: (Score:1)
We've been told, point blank right here on /. that you can't identify any ethnicity by just listening to their voice. Besides that, you simply *cannot* identify someone's gender by voice - why, a basso profundo can identify as a woman, right?
You might want to read the article before trolling:
However, the algorithm's interpretations were far from perfect. Speech2Face demonstrated "mixed performance" when confronted with language variations. For example, when the AI listened to an audio clip of an Asian man speaking Chinese, the program produced an image of an Asian face. However, when the same man spoke in English in a different audio clip, the AI generated the face of a white man, the scientists reported.
The algorithm also showed gender bias, associating low-pitched voices with male faces and high-pitched voices with female faces. And because the training dataset represents only educational videos from YouTube, it "does not represent equally the entire world population," the researchers wrote.
Code. (Score:4, Informative)
https://github.com/imatge-upc/... [github.com]
It's in python - not my code, just found it through google.
Ryan Fenton
did usually capture... ethnicities... (Score:2)
This is like that bit in BlacKkKlansman all over again, when David Duke claims he can identify black people by the way they pronounce 'are'...
Correction (Score:1)
The findings have been published in the preprint journal arXiv but have not been peer-reviewed.
arXiv is not a journal, and the findings have not yet been published. When people place their manuscript on arXiv, it's usually the same time as they submit it to a peer-reviewed journal. In other words, arXiv is a de facto repository for manuscripts currently undergoing peer-review. It is incorrect to report it as if it's already published, but in a journal that is not peer-reviewed.
Voices, faces, what's next? (Score:1)
Re: (Score:3)
Race gender and age determination (Score:2)
This looks to me an association based on race and age more than it is a determination of a persons face.
It's AI/ML used to identify a correlation between age race and gender in a persons speech and then it produces a generic output.
what i'd be more interested in seeing is if they run it against 100 speech samples from women in the same region of the same race. let's see how it does then.
And thus the phone sex industry dies (Score:2)
Re: (Score:2)
Will the phone sex industry survive clients finding out what the person they've been talking to actually looks like?
I don't think 13 year olds really care what the person they're talking to actually looks like. They could look like a mule with a face plastered with blue waffles and a 13 yo "would hit it".