An Algorithm That Learns Through Rewards May Show How Our Brain Does Too (technologyreview.com) 35
An anonymous reader quotes a report from MIT Technology Review: In a paper published in Nature today, DeepMind, Alphabet's AI subsidiary, has once again used lessons from reinforcement learning to propose a new theory about the reward mechanisms within our brains. The hypothesis, supported by initial experimental findings, could not only improve our understanding of mental health and motivation. It could also validate the current direction of AI research toward building more human-like general intelligence. At a high level, reinforcement learning follows the insight derived from Pavlov's dogs: it's possible to teach an agent to master complex, novel tasks through only positive and negative feedback. An algorithm begins learning an assigned task by randomly predicting which action might earn it a reward. It then takes the action, observes the real reward, and adjusts its prediction based on the margin of error. Over millions or even billions of trials, the algorithm's prediction errors converge to zero, at which point it knows precisely which actions to take to maximize its reward and so complete its task.
It turns out the brain's reward system works in much the same way -- a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward. Once the actual reward is received, they then fire off an amount of dopamine that corresponds to the prediction error. A better reward than expected triggers a strong dopamine release, while a worse reward than expected suppresses the chemical's production. The dopamine, in other words, serves as a correction signal, telling the neurons to adjust their predictions until they converge to reality. The phenomenon, known as reward prediction error, works much like a reinforcement-learning algorithm. The improved algorithm changes the way it predicts rewards. "Whereas the old approach estimated rewards as a single number -- meant to equal the average expected outcome -- the new approach represents them more accurately as a distribution," the report says. This lends itself to a new hypothesis: Do dopamine neurons also predict rewards in the same distributional way?
After testing this theory, DeepMind found "compelling evidence that the brain indeed uses distributional reward predictions to strengthen its learning algorithm," reports MIT Technology Review.
It turns out the brain's reward system works in much the same way -- a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward. Once the actual reward is received, they then fire off an amount of dopamine that corresponds to the prediction error. A better reward than expected triggers a strong dopamine release, while a worse reward than expected suppresses the chemical's production. The dopamine, in other words, serves as a correction signal, telling the neurons to adjust their predictions until they converge to reality. The phenomenon, known as reward prediction error, works much like a reinforcement-learning algorithm. The improved algorithm changes the way it predicts rewards. "Whereas the old approach estimated rewards as a single number -- meant to equal the average expected outcome -- the new approach represents them more accurately as a distribution," the report says. This lends itself to a new hypothesis: Do dopamine neurons also predict rewards in the same distributional way?
After testing this theory, DeepMind found "compelling evidence that the brain indeed uses distributional reward predictions to strengthen its learning algorithm," reports MIT Technology Review.
Algorithm (Score:2)
Re: (Score:2)
Nope. A "program" is what your average coder produces. It is a collection of bugs and half-assed things and has a fuzzy goal at best. An "algorithm" is a method to reach a very specific computational goal.
Re: (Score:2)
Re: (Score:2)
You are welcome.
Pavlov's Dog (Score:3)
Dog's, Humans, Machine learning (Modeled after human thought), BeauHD.
Same difference.
Re: (Score:2)
Sorry for my egregious grammatical error.
Re: (Score:1)
Dogs are smart, sensitive and generally good and deserve better.
Um, no. (Score:1, Troll)
No, idiots, software is not going to tell you how the human brain works any more than studying a hipster's beanie is going to tell you how a uterus works. Fuckin' navel gazers.
Re:Um, no. (Score:4, Insightful)
Re:Um, no. (Score:5, Insightful)
Sometimes it does.
The human brain doesn't even pretend to play by the rules of computer science, so the very best that code can do is to approximate someone's existing hypothesis of what seems to be happening inside the brain. The brain that you learn about is that guy whose hypothesis you modeled, as understood by a programmer.
Re: (Score:3)
Exactly. As long as neuroscience cannot even model very simple and deterministic computation machines (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005268), they cannot even credibly speculate what the mechanisms making up human thinking are. As most people are stupid, such baseless speculations always get a nice audience of morons though. The "discussions" for this story has a few prime examples.
Re: (Score:2)
studying a hipster's beanie is going to tell you how a uterus works
Sometimes it does.
I require you to explain this now.
Re: (Score:2)
It helps validate models of the brain. Do you have a better way to test models?
Re: (Score:2)
Re: (Score:2)
How on earth will software validate a model of something completely different? Software is just one additional model, not a "validation" in any sense of the word.
Re: (Score:1)
My question wasn't answered.
Re: (Score:2)
Could not agree more. This stupidity has been with the human race for a long, long time though. Morons trying to explain why they are morons and failing.
No, B.F. Skinner is still wrong (Score:4, Interesting)
There are endless examples of decisions that require projecting a future for which there are no historical examples of rewards (and, incidentally, incrementing a variable is only absurdly analogous to human neurological reward systems)... such as, off the top of my head, a parent running into a burning building to save their child. They have either no operant reinforcement for this situation historically, or the conditioning is overtly negative--historical cases of being burned by fire or hot objects, which they will ignore due to a higher -conceptual- priority.
The parent -thinks-, projects a future scenario, and makes a choice based on no previous corresponding situational experience. Like B.F. Skinner's pseudoscience, this is just another simplistic psychological construct that, at base, insists that human beings are not actually conscious.
Re: (Score:2)
Not to mention that most of what we would call higher-level learning activities in humans is hardly dopamine-producing. Consider, for example, all the college students trying to stay awake in Calculus 101. If the researchers want an AI that can perform tasks based on the expectation of receiving a treat I suppose they're on the right track. I recommend they study dogs.
Re: (Score:3)
Nah, dogs are too reliable to be models for humans. Cats are much more appropriate. They'll respond to caresses and food, but not consistently. And they'll scheme behind your back to get what they want while you mysteriously leave the entire house open to them so you can go shopping.
Re: (Score:2)
...this is just another simplistic psychological construct that, at base, insists that human beings are not actually conscious.
Indeed, and there it falls down completely and obviously. While few of the proponents of these models seem to be capable to even reach that inevitable conclusion, the evidence for the existence of consciousness is actually far stronger than _anything_ else we have in science. Sure, there are no explanations for it, but that is a shortcoming of the mechanisms used to explain reality at this time. It is not a valid reason to deny its existence.
Doesn't sound very new (Score:3)
This is doesn't sound very revolutionary, predictive systems have been doing something similar for years. Self tuning mechnisms in databases for example, they gather and react to positive outcomes, while they're very simple and I certainly wouldn't call it a "reward system" it is in effect the same way we humans operate. We check, gather, compare and pick the most suitable outcome that benefits the goal in mind. Our reward is a better chance at improving the efficiency of the task at hand, shortening or making the task easier to perform and thus saving us time and effort. In the case of a computer, this shortens the task, lowers the resource consumption which allows more concurrent tasks and other benefits such as saving energy.
The only difference I see here is that this latter algorithm is likely far more complex, has more data and can make far better choices as it's able to process the input at a more granular level rather than the course choices a simple metric/adjustment algorithm would make in today's systems.
Sounds like yet another non-story or someone trying to justify their grant/tenure by proving they've done something for the last X months by "standing on the shoulders of giants".
Good grief... (Score:2)
Flow (Score:3)
Interesting, in the psychological concept of Flow (a study of "joy"), the reward equating to the challenge provides the most joy. If they are out of wack, we don't find as much pleasure from the activity. If the brain is anticipating a result, and the dopamine is based on that, it might explain why our joy fades on repeating things--our brain anticipates the reward, so there's less hormone.
It could also explain why things like sequels have to way outdo originals to get the same level of appreciation.
"human or animal" (Score:2)
Whaz does the author think? Thar humans are plants? Maybe he's a slime mold. But us humans are animals. Sorry, you're not special. Boo hoo.
But what if humanity is constantly disappointing? (Score:2)
I know what's easily possible.
Yet have to face batshit insanities like e.g. the iPhone or "representative democracy" (an oxymoron, either it's a democracy or it has leaders) or "fat makes you fat" (so sugar makes you sweet then?) or people being triggered by anyone saying those things or so freakin many other things every day.
Any sane non-ignorant mind is *bound* to be constantly disappoined in humanity. And due to them limiting the own abilities too, it also forces one to be disappointed in oneself!
I mean
Re: (Score:2)
Any sane non-ignorant mind is *bound* to be constantly disappoined in humanity.
That insight is as old as humanity. And there is no indication things are getting better. Most people are still ye old caveman 1.0, just in a suit or a dress.
So I read the paper (Score:2)
Imagine you work for a living, every hour you get paid so you have a fixed reward. Compare this to getting paid in scratch tickets, often you get little or nothing but occasionally you win big for the same average. How does the brain express this variability? A simple model would say that you don't, the positive and negative signals cancel out until you feel equally rewarded. This paper indicates that you have biased receptors that are much easier/harder to trigger, so if your average is $10 some neurons wi
Re: (Score:3)
Imagine you work for a living, every hour you get paid so you have a fixed reward. Compare this to getting paid in scratch tickets, often you get little or nothing but occasionally you win big for the same average. How does the brain express this variability?
The popularity of the lottery's scratch tickets is interesting behavioral study because of its inherent squandering of accumulated resources. It is akin to the thrill associated with gambling, and though it has been called a tax on people who can't do math, there are many somewhat intelligent participants simply desperate for a positive feedback win
Working for a living is, for the time being, a dependable method to pay bills, eat regularly, and possibly enjoy a few comforts. It's simply not as exciting a
New? (Score:2)
What a crock. (Score:2)
They just reinvented reinforcement learning.
Just like every machine learning program since the early sixties.
And no, that fact that brains and computers both respond to
reinforcement does not mean that the underlying
mechanisms are in any way similar.