New AI Could Prevent Eavesdropping By Disguising Words With Custom Noise (science.org) 39
sciencehabit shares a report from Science Magazine: Big Brother is listening. Companies use "bossware" to listen to their employees when they're near their computers. Multiple "spyware" apps can record phone calls. And home devices such as Amazon's Echo can record everyday conversations. A new technology, called Neural Voice Camouflage, now offers a defense. It generates custom audio noise in the background as you talk, confusing the artificial intelligence (AI) that transcribes our recorded voices.
The new system uses an "adversarial attack." The strategy employs machine learning -- in which algorithms find patterns in data -- to tweak sounds in a way that causes an AI, but not people, to mistake it for something else. Essentially, you use one AI to fool another. The process isn't as easy as it sounds, however. The machine-learning AI needs to process the whole sound clip before knowing how to tweak it, which doesn't work when you want to camouflage in real time.
So in the new study, researchers taught a neural network, a machine-learning system inspired by the brain, to effectively predict the future. They trained it on many hours of recorded speech so it can constantly process 2-second clips of audio and disguise what's likely to be said next. For instance, if someone has just said "enjoy the great feast," it can't predict exactly what will be said next. But by taking into account what was just said, as well as characteristics of the speaker's voice, it produces sounds that will disrupt a range of possible phrases that could follow. That includes what actually happened next; here, the same speaker saying, "that's being cooked." To human listeners, the audio camouflage sounds like background noise, and they have no trouble understanding the spoken words. But machines stumble. The work was presented in a paper last month at the International Conference on Learning Representations, which peer reviews manuscript submissions.
The new system uses an "adversarial attack." The strategy employs machine learning -- in which algorithms find patterns in data -- to tweak sounds in a way that causes an AI, but not people, to mistake it for something else. Essentially, you use one AI to fool another. The process isn't as easy as it sounds, however. The machine-learning AI needs to process the whole sound clip before knowing how to tweak it, which doesn't work when you want to camouflage in real time.
So in the new study, researchers taught a neural network, a machine-learning system inspired by the brain, to effectively predict the future. They trained it on many hours of recorded speech so it can constantly process 2-second clips of audio and disguise what's likely to be said next. For instance, if someone has just said "enjoy the great feast," it can't predict exactly what will be said next. But by taking into account what was just said, as well as characteristics of the speaker's voice, it produces sounds that will disrupt a range of possible phrases that could follow. That includes what actually happened next; here, the same speaker saying, "that's being cooked." To human listeners, the audio camouflage sounds like background noise, and they have no trouble understanding the spoken words. But machines stumble. The work was presented in a paper last month at the International Conference on Learning Representations, which peer reviews manuscript submissions.
Makes sense... (Score:3)
In adversarial AI, you'd need to account for this too. Those chunks could be accurately identified with minimal input. We do that ourselves when we're talking in noisy, distracting environments.
Re: (Score:2)
Indeed. Also one of the reasons why regular English only has something like 2 bits of entropy per character.
Re: (Score:2)
John von Neumann famously suggested that Shannon use the word "entropy" because "nobody knows what entropy really is".
"Two bits of entropy per character" isn't so straight-forward. The entropy per letter changes depending on the letters that precede it. On average, Shannon found an entropy of ~4.7 bits for the first letter (out of 26) in a sequence, ~4.14 for the second, and ~3.3 bits for the third letter. Against 8000 English words, he worked out the average entropy to be ~11.82 bits per word or ~2.62 b
Re: (Score:2)
I am aware. In particular, entropy only makes sense if you have independent symbols. That would require whole, independent text-fragments. The idea is still useful for estimates and approximations.
Other things confuses artificial stupidity as well (Score:3, Funny)
For example, try talking German and have MS Teams record it. It will do a "transcript" as well, which is the most excessive nonsense, because it apparently cannot tell you are not speaking English and is really just faking it. I expect this works with other languages as well.
Re: (Score:2)
That reminds me of this classic animutation [youtube.com] (attention: NSFW) phonetically translating Dutch to English, for a very comic effect. I wonder if a computer mistranscription can be as fun as a human's!
Re: (Score:2)
Re: (Score:2)
. . . We put the food. You put fire extinguisher cabinets on the architectural plans and he put 42 per floor. Does that sound? Aren't you dumb? . . .
Re: (Score:2)
I am not surprised. I did a few simple English statements to demonstrate this to my IT Sec students (Teaching is in German) as a bit of a comic relief and it got those right, but having a strong accent, using complex language or just using a bit less typical speech patterns probably throws this thing off completely. But the really pathetic thing is that it does not notice that it did produce low-confidence results and that makes it useless as anything but a toy.
Two points (Score:4, Insightful)
1. This is a device that listens in to everything you're saying.
2. The practical effect if used widely is to provide a large scale training ground for AI to filter speech from noise so that people are actually more vulnerable than they were previously (since AI has beem trained to listem through the souind of the shower running or the radio static or whatever).
Re: (Score:2)
Re: (Score:2)
Did you miss the part about "adversarial attack"? The listeners-we're-trying-confound will always be one step behind the masking application. AFAIK the pool of "adversarial" is infinite, and defense needn't be perfect. Masking merely needs to f-up just enough to confound the listeners' own guesses of what's coming next. Like modding just a tiny % of pixels to make images of guns get classified as bananas, we only need to confound a fraction of the speech to destroy the algo's ability to classify it.
I re
Encryption (Score:2)
If this is not real-time, proper encryption would be better than distorting the signal.
Re: Encryption (Score:2)
You would have to encrypt the input to the spyware programs. Kinda hardâ¦unless you can insert middleware or you natively speak AES or RSA.
Pointless over-engineering. (Score:2)
Surely this is simply a case of over-engineering. Using technology to do something that is easily done without, and thereby adding unnecessary complication and fail-points. The sensible solution to the problem described is to turn off the devices that are listening when you don't want to be overheard, or if that isn't possible keep sensitive conversations to places you aren't being listened to.
Re: Pointless over-engineering. (Score:1)
Re: (Score:2)
Re: Pointless over-engineering. (Score:4, Informative)
Lets say you are in a country where employers pay for health cover and they are (illegally and secretly) using this to drive down their insurance costs by firing anyone who gets a medical condition. Your partner comes by in a bit of an emotional state and says "I just got told I have an invasive carcinoma, what do I do". You will likely end up regretting not using this device and switching it off once you know the conversation is okay to have with your boss listening in, assuming you ever realise what happened. Always better to block first then allow what you know is safe to allow.
Re: (Score:2)
Re: (Score:2)
If I were an unscrupulous employee in such a company, all my potential rivals would find such a device interpolated into their audio path. QED.
Re: (Score:2)
The sensible solution to the problem described is to turn off the devices that are listening when you don't want to be overheard, or if that isn't possible keep sensitive conversations to places you aren't being listened to.
NO! you must put their device in the room with you so that no one can hear your sensitive conversations! They super-mega promise they won't listen to your secrets!
Re: (Score:2)
Isn't live TV sports already doing this? (Score:2)
I noticed that you really can't understand what the crowd is saying any more at NASCAR races and football games, as it just sounds like background noise. I figured that the networks were using some sort of AI noise filter to block out "Let's Go Brandon" chants and other such things that they don't want to be broadcasted.
Language Game (Score:2)
So, basically a very simple version of a Language Game [wikipedia.org]
The cone of silence! (Score:2)
If you're a Get Smart fan...
Now all we need is the room where, when you talk, words spill out of your mouth in print form, with no noise.
If they really are out to get you (Score:3)
This device won't stop them. The trouble is, in the end, you want to be understood by those around you. If your intended listener can understand you, then so can a device that is situated in the right place.
Re: (Score:2)
Exactly. In general, adversarial attacks to neural networks are only ever going to be a temporary measure, because they just generate more training data for the next system.
Even this one already doesn't fucking work. I tried their example on https://cloud.google.com/speec... [google.com]
This was what it recognized:
- "Now then let's enter the city and enjoy the great feast. It's being cooked. I nearly starved myself, but it is hard work. "
- "He doesn't say, but it's on the frontier and on the map, everything Beyond is ma
The obvious (Score:1)
Yeah, no. (Score:2)
This is _precisely_ what you want to train an AI.
You take your existing AI that understands voices, feed it both versions, and suddenly (perhaps using modestly greater resources) after a cycle of training, it can ignore the noise.
This is pretty much the same problem as captchas. First generation captchas can be trivially solved with AI. Ones that can't can't
Re: (Score:2)
Much like newer-generation CAPTCHAs, this is using an adversarial network to train itself against its own generated noise until it fools itself optimally. Like you say, it will probably mean it will still be hard for people to understand unless the audio processing hardware of the physical human body has advantages in how it picks up vibration that makes it easier to differentiate white noise than a microphone.
Dr. offices etc (Score:1)
Klatu Barata Ni*Cough-Cough* (Score:1)
I expected this article to be tagged hashtag #klatubarataniktu.
Re: (Score:2)
Artificial stupidity (Score:2)
They invented the Waterfall sound.
Re: Artificial stupidity (Score:2)
I dont know what your referencing, but I was thinking an easy way to achieve this is to generate peaceful sound played by an array of many small speakers with known locations, but altered by each speaker so they interfere to zero at places where peoples heads are. So there are these silent zones in the room. People talking to each other will hear each other, but a mic even placed right between them will not.
How much processing power? (Score:2)
Re: (Score:2)