An Algorithm That Learns Through Rewards May Show How Our Brain Does Too

An Algorithm That Learns Through Rewards May Show How Our Brain Does Too (technologyreview.com) 35

Posted by BeauHD on Friday January 17, 2020 @11:30PM from the punishments-and-rewards dept.

An anonymous reader quotes a report from MIT Technology Review: In a paper published in Nature today, DeepMind, Alphabet's AI subsidiary, has once again used lessons from reinforcement learning to propose a new theory about the reward mechanisms within our brains. The hypothesis, supported by initial experimental findings, could not only improve our understanding of mental health and motivation. It could also validate the current direction of AI research toward building more human-like general intelligence. At a high level, reinforcement learning follows the insight derived from Pavlov's dogs: it's possible to teach an agent to master complex, novel tasks through only positive and negative feedback. An algorithm begins learning an assigned task by randomly predicting which action might earn it a reward. It then takes the action, observes the real reward, and adjusts its prediction based on the margin of error. Over millions or even billions of trials, the algorithm's prediction errors converge to zero, at which point it knows precisely which actions to take to maximize its reward and so complete its task.

It turns out the brain's reward system works in much the same way -- a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward. Once the actual reward is received, they then fire off an amount of dopamine that corresponds to the prediction error. A better reward than expected triggers a strong dopamine release, while a worse reward than expected suppresses the chemical's production. The dopamine, in other words, serves as a correction signal, telling the neurons to adjust their predictions until they converge to reality. The phenomenon, known as reward prediction error, works much like a reinforcement-learning algorithm. The improved algorithm changes the way it predicts rewards. "Whereas the old approach estimated rewards as a single number -- meant to equal the average expected outcome -- the new approach represents them more accurately as a distribution," the report says. This lends itself to a new hypothesis: Do dopamine neurons also predict rewards in the same distributional way?

After testing this theory, DeepMind found "compelling evidence that the brain indeed uses distributional reward predictions to strengthen its learning algorithm," reports MIT Technology Review.

An Algorithm That Learns Through Rewards May Show How Our Brain Does Too

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 35 Comments Log In/Create an Account

Comments Filter:

Algorithm (Score:2)

by tquasar ( 1405457 ) writes:

Is the word overused? Can it be called a program?
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Nope. A "program" is what your average coder produces. It is a collection of bugs and half-assed things and has a fuzzy goal at best. An "algorithm" is a method to reach a very specific computational goal.
  - Re: (Score:2)
    
    by tquasar ( 1405457 ) writes:
    
    Thanks for the clarification. Tom Quasar
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      You are welcome.
Pavlov's Dog (Score:3)

by Ryzilynt ( 3492885 ) writes: on Saturday January 18, 2020 @12:03AM (#59631880)

Dog's, Humans, Machine learning (Modeled after human thought), BeauHD.
Same difference.

- Re: (Score:2)
  
  by Ryzilynt ( 3492885 ) writes:
  
  Sorry for my egregious grammatical error.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  There may not be a lot of difference between an Alphabet's engineer, their "AI", you and BeauHD in terms of intelligence, but don't put the dogs into your party.
  Dogs are smart, sensitive and generally good and deserve better.
Um, no. (Score:1, Troll)

by asackett ( 161377 ) writes:

No, idiots, software is not going to tell you how the human brain works any more than studying a hipster's beanie is going to tell you how a uterus works. Fuckin' navel gazers.
- Re:Um, no. (Score:4, Insightful)
  
  by timeOday ( 582209 ) writes: on Saturday January 18, 2020 @12:10AM (#59631894)
  
  Sometimes it does. But usually, having programs that work well is more important than understanding how the brain works anyways. Nature doesn't necessarily do things the best way.
  
  - Re:Um, no. (Score:5, Insightful)
    
    by asackett ( 161377 ) writes: on Saturday January 18, 2020 @12:33AM (#59631912) Homepage
    
    Sometimes it does.
    The human brain doesn't even pretend to play by the rules of computer science, so the very best that code can do is to approximate someone's existing hypothesis of what seems to be happening inside the brain. The brain that you learn about is that guy whose hypothesis you modeled, as understood by a programmer.
    
    - Re: (Score:3)
      
      by gweihir ( 88907 ) writes:
      
      Exactly. As long as neuroscience cannot even model very simple and deterministic computation machines (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005268), they cannot even credibly speculate what the mechanisms making up human thinking are. As most people are stupid, such baseless speculations always get a nice audience of morons though. The "discussions" for this story has a few prime examples.
  - Re: (Score:2)
    
    by JoeDuncan ( 874519 ) writes:
    
    studying a hipster's beanie is going to tell you how a uterus works
    Sometimes it does.
    I require you to explain this now.
- Re: (Score:2)
  
  by Tablizer ( 95088 ) writes:
  
  software is not going to tell you how the human brain works
  It helps validate models of the brain. Do you have a better way to test models?
  - Re: (Score:2)
    
    by asackett ( 161377 ) writes:
    
    Do you have a better way to test models?
    They're not testing a model, they're expressing a model. Unless disproofs and alternative theories are actively considered, which the article doesn't mention them having been, all we've learned by the observations in this work is how easily confirmation bias can look real purty in that dress. Here's my theory, there's one data point, I'm done -- because I'm Googod. Um, no.
    Hypothesis: There is a wide but normal range of response of dopamine neurons in response to given stimuli, and all it tells us about is a biological manufacturing tolerance that doesn't mean anything.
    True? False? Don't know, don't care. It's a hypothesis, and it fits the observation.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    How on earth will software validate a model of something completely different? Software is just one additional model, not a "validation" in any sense of the word.
    - Re: (Score:1)
      
      by Tablizer ( 95088 ) writes:
      
      My question wasn't answered.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Could not agree more. This stupidity has been with the human race for a long, long time though. Morons trying to explain why they are morons and failing.
No, B.F. Skinner is still wrong (Score:4, Interesting)

by aeropage ( 6536406 ) writes: on Saturday January 18, 2020 @12:24AM (#59631906)

We've been down the road of explaining behavior and thought by Operant Conditioning before. It leads to only a caricature of consciousness.

There are endless examples of decisions that require projecting a future for which there are no historical examples of rewards (and, incidentally, incrementing a variable is only absurdly analogous to human neurological reward systems)... such as, off the top of my head, a parent running into a burning building to save their child. They have either no operant reinforcement for this situation historically, or the conditioning is overtly negative--historical cases of being burned by fire or hot objects, which they will ignore due to a higher -conceptual- priority.

The parent -thinks-, projects a future scenario, and makes a choice based on no previous corresponding situational experience. Like B.F. Skinner's pseudoscience, this is just another simplistic psychological construct that, at base, insists that human beings are not actually conscious.

- Re: (Score:2)
  
  by ShoulderOfOrion ( 646118 ) writes:
  
  Not to mention that most of what we would call higher-level learning activities in humans is hardly dopamine-producing. Consider, for example, all the college students trying to stay awake in Calculus 101. If the researchers want an AI that can perform tasks based on the expectation of receiving a treat I suppose they're on the right track. I recommend they study dogs.
  - Re: (Score:3)
    
    by gtall ( 79522 ) writes:
    
    Nah, dogs are too reliable to be models for humans. Cats are much more appropriate. They'll respond to caresses and food, but not consistently. And they'll scheme behind your back to get what they want while you mysteriously leave the entire house open to them so you can go shopping.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  ...this is just another simplistic psychological construct that, at base, insists that human beings are not actually conscious.
  Indeed, and there it falls down completely and obviously. While few of the proponents of these models seem to be capable to even reach that inevitable conclusion, the evidence for the existence of consciousness is actually far stronger than _anything_ else we have in science. Sure, there are no explanations for it, but that is a shortcoming of the mechanisms used to explain reality at this time. It is not a valid reason to deny its existence.
Doesn't sound very new (Score:3)

by AxisOfPleasure ( 5902864 ) writes: on Saturday January 18, 2020 @01:01AM (#59631948)

This is doesn't sound very revolutionary, predictive systems have been doing something similar for years. Self tuning mechnisms in databases for example, they gather and react to positive outcomes, while they're very simple and I certainly wouldn't call it a "reward system" it is in effect the same way we humans operate. We check, gather, compare and pick the most suitable outcome that benefits the goal in mind. Our reward is a better chance at improving the efficiency of the task at hand, shortening or making the task easier to perform and thus saving us time and effort. In the case of a computer, this shortens the task, lowers the resource consumption which allows more concurrent tasks and other benefits such as saving energy.
The only difference I see here is that this latter algorithm is likely far more complex, has more data and can make far better choices as it's able to process the input at a more granular level rather than the course choices a simple metric/adjustment algorithm would make in today's systems.
Sounds like yet another non-story or someone trying to justify their grant/tenure by proving they've done something for the last X months by "standing on the shoulders of giants".

Good grief... (Score:2)

by QuietLagoon ( 813062 ) writes:

Yet another highly publicized explanation of how the human brain works. Yawn. There is so much hype around AI nowadays, (the marketing people have grasped it.) the whole topic of AI has become boring.
Flow (Score:3)

by RJFerret ( 1279530 ) writes: on Saturday January 18, 2020 @01:33AM (#59631974)

Interesting, in the psychological concept of Flow (a study of "joy"), the reward equating to the challenge provides the most joy. If they are out of wack, we don't find as much pleasure from the activity. If the brain is anticipating a result, and the dopamine is based on that, it might explain why our joy fades on repeating things--our brain anticipates the reward, so there's less hormone.
It could also explain why things like sequels have to way outdo originals to get the same level of appreciation.

"human or animal" (Score:2)

by BAReFO0t ( 6240524 ) writes:

Whaz does the author think? Thar humans are plants? Maybe he's a slime mold. But us humans are animals. Sorry, you're not special. Boo hoo.
But what if humanity is constantly disappointing? (Score:2)

by BAReFO0t ( 6240524 ) writes:

I know what's easily possible.
Yet have to face batshit insanities like e.g. the iPhone or "representative democracy" (an oxymoron, either it's a democracy or it has leaders) or "fat makes you fat" (so sugar makes you sweet then?) or people being triggered by anyone saying those things or so freakin many other things every day.
Any sane non-ignorant mind is *bound* to be constantly disappoined in humanity. And due to them limiting the own abilities too, it also forces one to be disappointed in oneself!
I mean
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Any sane non-ignorant mind is *bound* to be constantly disappoined in humanity.
  That insight is as old as humanity. And there is no indication things are getting better. Most people are still ye old caveman 1.0, just in a suit or a dress.
So I read the paper (Score:2)

by Kjella ( 173770 ) writes:

Imagine you work for a living, every hour you get paid so you have a fixed reward. Compare this to getting paid in scratch tickets, often you get little or nothing but occasionally you win big for the same average. How does the brain express this variability? A simple model would say that you don't, the positive and negative signals cancel out until you feel equally rewarded. This paper indicates that you have biased receptors that are much easier/harder to trigger, so if your average is $10 some neurons wi
- Re: (Score:3)
  
  by rmdingler ( 1955220 ) writes:
  
  Imagine you work for a living, every hour you get paid so you have a fixed reward. Compare this to getting paid in scratch tickets, often you get little or nothing but occasionally you win big for the same average. How does the brain express this variability?
  The popularity of the lottery's scratch tickets is interesting behavioral study because of its inherent squandering of accumulated resources. It is akin to the thrill associated with gambling, and though it has been called a tax on people who can't do math, there are many somewhat intelligent participants simply desperate for a positive feedback win
  Working for a living is, for the time being, a dependable method to pay bills, eat regularly, and possibly enjoy a few comforts. It's simply not as exciting a
New? (Score:2)

by NicknameUnavailable ( 4134147 ) writes:

This isn't remotely new, this is like ANN design 101 from decades ago.
What a crock. (Score:2)

by RespekMyAthorati ( 798091 ) writes:

Oh boy!
They just reinvented reinforcement learning.
Just like every machine learning program since the early sixties.

And no, that fact that brains and computers both respond to
reinforcement does not mean that the underlying
mechanisms are in any way similar.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

An Algorithm That Learns Through Rewards May Show How Our Brain Does Too (technologyreview.com) 35

An Algorithm That Learns Through Rewards May Show How Our Brain Does Too More Login

An Algorithm That Learns Through Rewards May Show How Our Brain Does Too

Algorithm (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Pavlov's Dog (Score:3)

Re: (Score:2)

Re: (Score:1)

Um, no. (Score:1, Troll)

Re:Um, no. (Score:4, Insightful)

Re:Um, no. (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

No, B.F. Skinner is still wrong (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Doesn't sound very new (Score:3)

Good grief... (Score:2)

Flow (Score:3)

"human or animal" (Score:2)

But what if humanity is constantly disappointing? (Score:2)

Re: (Score:2)

So I read the paper (Score:2)

Re: (Score:3)

New? (Score:2)

What a crock. (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot