Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Science Technology

New AI Could Prevent Eavesdropping By Disguising Words With Custom Noise (science.org) 39

sciencehabit shares a report from Science Magazine: Big Brother is listening. Companies use "bossware" to listen to their employees when they're near their computers. Multiple "spyware" apps can record phone calls. And home devices such as Amazon's Echo can record everyday conversations. A new technology, called Neural Voice Camouflage, now offers a defense. It generates custom audio noise in the background as you talk, confusing the artificial intelligence (AI) that transcribes our recorded voices.

The new system uses an "adversarial attack." The strategy employs machine learning -- in which algorithms find patterns in data -- to tweak sounds in a way that causes an AI, but not people, to mistake it for something else. Essentially, you use one AI to fool another. The process isn't as easy as it sounds, however. The machine-learning AI needs to process the whole sound clip before knowing how to tweak it, which doesn't work when you want to camouflage in real time.

So in the new study, researchers taught a neural network, a machine-learning system inspired by the brain, to effectively predict the future. They trained it on many hours of recorded speech so it can constantly process 2-second clips of audio and disguise what's likely to be said next. For instance, if someone has just said "enjoy the great feast," it can't predict exactly what will be said next. But by taking into account what was just said, as well as characteristics of the speaker's voice, it produces sounds that will disrupt a range of possible phrases that could follow. That includes what actually happened next; here, the same speaker saying, "that's being cooked." To human listeners, the audio camouflage sounds like background noise, and they have no trouble understanding the spoken words. But machines stumble.
The work was presented in a paper last month at the International Conference on Learning Representations, which peer reviews manuscript submissions.
This discussion has been archived. No new comments can be posted.

New AI Could Prevent Eavesdropping By Disguising Words With Custom Noise

Comments Filter:
  • by rantrantrant ( 4753443 ) on Wednesday June 01, 2022 @05:29AM (#62582888)
    ...and should get better over time. Despite the almost infinite number of possible combinations of words that are grammatically correct (Chomsky), we humans are habitual creatures why actually rely quite heavily on being able to predict what each other are going to say next in order to process & parse it, accommodate errors, etc., more efficiently because our working memory capacity is quite limited & can parse everything in real time. Rather, we process highly frequent language in prefabricated chunks (groups of words) whole, whole & adapt them when necessary for less frequent variations. Corpus analyses reveal that somewhere between 50% to 80% of language in use is made up of these chunks. In this sense, human communication is highly idiomatic.

    In adversarial AI, you'd need to account for this too. Those chunks could be accurately identified with minimal input. We do that ourselves when we're talking in noisy, distracting environments.
    • by gweihir ( 88907 )

      Indeed. Also one of the reasons why regular English only has something like 2 bits of entropy per character.

      • by narcc ( 412956 )

        John von Neumann famously suggested that Shannon use the word "entropy" because "nobody knows what entropy really is".

        "Two bits of entropy per character" isn't so straight-forward. The entropy per letter changes depending on the letters that precede it. On average, Shannon found an entropy of ~4.7 bits for the first letter (out of 26) in a sequence, ~4.14 for the second, and ~3.3 bits for the third letter. Against 8000 English words, he worked out the average entropy to be ~11.82 bits per word or ~2.62 b

        • by gweihir ( 88907 )

          I am aware. In particular, entropy only makes sense if you have independent symbols. That would require whole, independent text-fragments. The idea is still useful for estimates and approximations.

  • by gweihir ( 88907 ) on Wednesday June 01, 2022 @05:30AM (#62582890)

    For example, try talking German and have MS Teams record it. It will do a "transcript" as well, which is the most excessive nonsense, because it apparently cannot tell you are not speaking English and is really just faking it. I expect this works with other languages as well.

    • That reminds me of this classic animutation [youtube.com] (attention: NSFW) phonetically translating Dutch to English, for a very comic effect. I wonder if a computer mistranscription can be as fun as a human's!

      • Youtube's automatic closed captioning for this video would probably be pretty funny if it weren't just given "[Music]". Auto CC of live newscasts (and similar, on youtube) used to get pretty weird, and i've seen some very strange things written during live TV with CCs being written by humans on a few seconds' delay.
    • by jbengt ( 874751 )
      Teams is not too good at English, either. Teams once e-mailed me a voice mail transcript from a client that was mostly incomprehensible but insulting:

      . . . We put the food. You put fire extinguisher cabinets on the architectural plans and he put 42 per floor. Does that sound? Aren't you dumb? . . .

      • by gweihir ( 88907 )

        I am not surprised. I did a few simple English statements to demonstrate this to my IT Sec students (Teaching is in German) as a bit of a comic relief and it got those right, but having a strong accent, using complex language or just using a bit less typical speech patterns probably throws this thing off completely. But the really pathetic thing is that it does not notice that it did produce low-confidence results and that makes it useless as anything but a toy.

  • Two points (Score:4, Insightful)

    by Anonymous Coward on Wednesday June 01, 2022 @05:37AM (#62582898)

    1. This is a device that listens in to everything you're saying.

    2. The practical effect if used widely is to provide a large scale training ground for AI to filter speech from noise so that people are actually more vulnerable than they were previously (since AI has beem trained to listem through the souind of the shower running or the radio static or whatever).

    • by Potor ( 658520 )
      This is precisely why a technological problem is not always best served by a technological solution.
    • Did you miss the part about "adversarial attack"? The listeners-we're-trying-confound will always be one step behind the masking application. AFAIK the pool of "adversarial" is infinite, and defense needn't be perfect. Masking merely needs to f-up just enough to confound the listeners' own guesses of what's coming next. Like modding just a tiny % of pixels to make images of guns get classified as bananas, we only need to confound a fraction of the speech to destroy the algo's ability to classify it.

      I re

  • If this is not real-time, proper encryption would be better than distorting the signal.

  • Surely this is simply a case of over-engineering. Using technology to do something that is easily done without, and thereby adding unnecessary complication and fail-points. The sensible solution to the problem described is to turn off the devices that are listening when you don't want to be overheard, or if that isn't possible keep sensitive conversations to places you aren't being listened to.

    • Employer provided laptops often don't have that option.
      • Which is why I included the second option. Keep sensitive conversations to places where you aren't being listened to. If being listened to while you are working remotely is part of your job, bypassing it with a device isn't likely to go down well with your employer anyway so that's not a good use case for it.
        • by AleRunner ( 4556245 ) on Wednesday June 01, 2022 @07:41AM (#62583078)

          Lets say you are in a country where employers pay for health cover and they are (illegally and secretly) using this to drive down their insurance costs by firing anyone who gets a medical condition. Your partner comes by in a bit of an emotional state and says "I just got told I have an invasive carcinoma, what do I do". You will likely end up regretting not using this device and switching it off once you know the conversation is okay to have with your boss listening in, assuming you ever realise what happened. Always better to block first then allow what you know is safe to allow.

          • If I were that type of unscrupulous employer anybody using such a device would be sacked immediately for that anyway. Wouldn't even need to have that conversation, they wouldn't last that long. After all, if they are deliberately hiding their conversations they must have something to hide.
            • If I were an unscrupulous employee in such a company, all my potential rivals would find such a device interpolated into their audio path. QED.

    • by GoTeam ( 5042081 )

      The sensible solution to the problem described is to turn off the devices that are listening when you don't want to be overheard, or if that isn't possible keep sensitive conversations to places you aren't being listened to.

      NO! you must put their device in the room with you so that no one can hear your sensitive conversations! They super-mega promise they won't listen to your secrets!

    • This method could be used to publish videos and images that would not trigger automatic content filtering in social networks. They would be easy to understand for humans but not to the neural nets that detect porn or other forbidden categories.
  • I noticed that you really can't understand what the crowd is saying any more at NASCAR races and football games, as it just sounds like background noise. I figured that the networks were using some sort of AI noise filter to block out "Let's Go Brandon" chants and other such things that they don't want to be broadcasted.

  • So, basically a very simple version of a Language Game [wikipedia.org]

  • If you're a Get Smart fan...

    Now all we need is the room where, when you talk, words spill out of your mouth in print form, with no noise.

  • by Tony Isaac ( 1301187 ) on Wednesday June 01, 2022 @07:37AM (#62583070) Homepage

    This device won't stop them. The trouble is, in the end, you want to be understood by those around you. If your intended listener can understand you, then so can a device that is situated in the right place.

    • Exactly. In general, adversarial attacks to neural networks are only ever going to be a temporary measure, because they just generate more training data for the next system.

      Even this one already doesn't fucking work. I tried their example on https://cloud.google.com/speec... [google.com]
      This was what it recognized:
      - "Now then let's enter the city and enjoy the great feast. It's being cooked. I nearly starved myself, but it is hard work. "
      - "He doesn't say, but it's on the frontier and on the map, everything Beyond is ma

  • Solving the solution of convenience with paranoia. Sounds like technology is on track for a better future. Programed by guys with compassion and vision.
  • ' To human listeners, the audio camouflage sounds like background noise, and they have no trouble understanding the spoken words. But machines stumble.'

    This is _precisely_ what you want to train an AI.

    You take your existing AI that understands voices, feed it both versions, and suddenly (perhaps using modestly greater resources) after a cycle of training, it can ignore the noise.

    This is pretty much the same problem as captchas. First generation captchas can be trivially solved with AI. Ones that can't can't
    • Much like newer-generation CAPTCHAs, this is using an adversarial network to train itself against its own generated noise until it fools itself optimally. Like you say, it will probably mean it will still be hard for people to understand unless the audio processing hardware of the physical human body has advantages in how it picks up vibration that makes it easier to differentiate white noise than a microphone.

  • Played back over a speaker, could be better than a white noise machine when talking about private medical info at a Dr etc. As scrambling / protection, seems like a cat-and-mouse game where based on what the device decided to cover with, you could train a program to guess what the original was.
  • I expected this article to be tagged hashtag #klatubarataniktu.

  • They invented the Waterfall sound.

    • I dont know what your referencing, but I was thinking an easy way to achieve this is to generate peaceful sound played by an array of many small speakers with known locations, but altered by each speaker so they interfere to zero at places where peoples heads are. So there are these silent zones in the room. People talking to each other will hear each other, but a mic even placed right between them will not.

  • When I imagine some company providing consumers with these results, the first thing that comes to mind is that they'll make cheap hardware that transmits everything you say to the home office where it will be processed, and the appropriate garble will be sent back. Of course, once they have everything you say at the home office...
  • Comment removed based on user account deletion

The 11 is for people with the pride of a 10 and the pocketbook of an 8. -- R.B. Greenberg [referring to PDPs?]

Working...