UK Researchers Make Neural Networks Smarter 56
Small Hairy Troll writes: "EDTN is running this story concerning a researcher in the UK who has come up with a method for getting those pesky Neural Nets to teach themselves to see. Called the 'Product of Experts,' the Neural Net is built using 'Experts.' If "you had one expert that preferred furry animals, whereas another expert preferred domesticated animals and another preferred small animals, their votes ... would light up dogs and cats very nicely." And an Edinburgh professor is quoted in the story as calling it "the first neural-network architecture that is both sensibly implementable and worth implementing."
Re:Real cool AI (Score:1)
Re:Real cool AI (Score:1)
Re:Way Over the Head (Score:3)
Both.
Cognitive scientists are using NN technology as a 'biologically valid' model for cognition. (Though only a fool would remain unaware of the enormous gap between our NN toys and the real thing, and of the enormous simplification that goes into our toys.)
Others just look at NN as a technology to be exploited without reference to biology.
> I suspect that this field has narrowed in the last decade (but I may be wrong), and so I fear that it may be getting wayyy esoteric.
Wayyyyy. Like any other branch of science, especially CS, this field is rapidly "narrowing" in the sense of getting deeper, but also "broadening" in the sense of developing more branches and more connections to other fields. (E.g., lots of parallels have been shown between NN and physics, and between NN and statistics.)
> As a practical engineer who needs solutions today, should I devote more energies to this or less? What is happening elsewhere in the field?
It's no longer possible even for NN researchers to stay on top of everything that's going on in the field, so don't even think about investing that much time in it.
Beyond that, what's your field of application in engineering? Do your journals ever cover relevant NN technology? If not, you might be able to start a SIG, so that the effort of keeping an ear to the ground and filtering out the uninteresting material could be spread among the members, rather than going it solo.
--
Way Over the Head (Score:2)
I guess it's hard to explain the field of AI to outsiders, because even though I (used to) understand neural networks, I feel that I need some sort of a diagram to 'get it' here. But what I think I heard is that "learning is the big problem in NN, or maybe a better word would be teaching"
Okay, I do have a question for the experts out there. Help would be greatly appreciated 'cuz we may be able some day to apply this to National Missile Defence to help discriminate balloons from nukes, currently a big problem.
Question: Seeing the words 'biologically valid' conjures up an image of scientists pursuing pure science rather than concentrating on the applications of it. Is the goal of NN today more theoretical (we want to get something to behave more like a smart being) than practical (we want something that will specifically put names to faces/discriminate balloons from weapons/identify handwriting like an expert).
I suspect that this field has narrowed in the last decade (but I may be wrong), and so I fear that it may be getting wayyy esoteric. As a practical engineer who needs solutions today, should I devote more energies to this or less? What is happening elsewhere in the field?
Re:Oh come on! (Score:2)
This doesn't seem like much of a break-through. The article mentions Back-propagation networks as an effective (but as I understand it, slow) NN. The article says that this technique is an improvement on back-props. I do know that in the last year, Quick-props and Fast-props have both came out which are 'major' improvements on the back-props algorithm.
My question for the specialist is, how is the PoE model any different/better/worse than these other improvements? It is my understanding that Quick-props is very good for practical image recognition problems.
Re:I sure hope so -- they could hardly be dumber! (Score:1)
When I fit a curve to some data, it's because I have a certain fundamental understanding about that data and how it should behave. Only an idiot would try to fit a parabola to an Arrhenius curve.
AFAIK, no NN gives this insight. It is therefore "dumb" in both senses of the word.
I sure hope so -- they could hardly be dumber! (Score:2)
As described to me, neural nets are _huge_ (every datapoint is in) underconstrained matricies with an infinite number of equally valid solutions. "Training" [programming] them is an exercise to find a strategy for the "best" solution.
Practically, when NNs are well done they will give you back the data you fed into them! When exceptionally good, they will give reasonable interpolations on the data. But forget about correct extrapolations.
Re:&& == smarter?? (Score:1)
Problem is you have to hardcode in or read in what is furry and what is small and what is domesticated, whereas the nn figures out what these features are and adapts to changes in the input automatically. Otherwise you have some human extracting new features and hardcoding them somewhere.
Similar to the Sand Mouse (Score:3)
Of course they're not smart (Score:3)
Well, this is partially true. At least the MLP neural network is just a non-linear function approximator, that can, in theory, learn any mapping from the inputs to the outputs. The network is trained using points from the input space together with the desired output. Statisticians would probably call MLP non-linear regression. I also know some statistics profs who have a rather high regard for neural networks. The name has a lot of hype, but the methodology works. About correct extrapolations: I would like to see anything correctly extrapolate in a general case given only a few random observed points and desired output. The performance of these methods depends on how well the assumptions of the method correspond to the way the observed data really behaves and you cannot tell that with certainty from a few random points.
The training is usually done using two separate sets of data: a training set and a test set. The training set is used to train the network and the test set is used to test the performance. When the performance of the network starts to degrade instead of improving, then the network is starting to overfit and loses it's ability to generalize.
Basically you can just think of the MLP neural network as a classifier (when your goal is to classify, which is usually the case with neural network) that draws arbitrary boundaries in n dimensional space. Somehow I always think of those blobby objects in computer graphics, when I think of the classification boundaries, but this is of course not strictly correct. I guess it still helps.
Re:&& == smarter?? (Score:1)
>You could've come off as someone who was
>interested and wanted to know more. But no. You >had to make a snide ass remark. Good one, bucko, >we now all know what a dumbass you really are.
'nough said...
Re:&& == smarter?? (Score:3)
I'm an AI researcher, and I'll tell you that you're patently wrong
One thing first: Hinton (the inventor of PoE) was one of the people re-popularizing NN's way back in the early 80's while at U of T.
Now, others have tried combining experts before. But Hinton's approach beats them empirically. The reason is that the experts are trained together in PoE rather than being trained seperately and considered jointly for evaluation, as in previous approaches...
Another thing: the NN's here are _very_ different from backprop NN's. The entire topology is different. Backprop neural networks are a special case of Bayesian nets, but PoE is based upon random fields. While there is current research being done on training random fields from bayes nets, there are in fact things you can represent with a random field that can't be represented with bayes net (some of Minsky and Papert's proofs for perceptrions in fact be extended to prove limitations on backprop nets and bayes nets). The converse is also true: there are distributions that can be represented in Bayes nets and not in random fields. (But of course, both classes of distributions can be represented as factor graphs...)
Re:Neural Networks (Score:1)
Re:Similar to the Sand Mouse (Score:1)
This technique is unsupervised, which means it figures out different categories (the number depends on learning parameters and the data presented) and clusters the inputs into one or more of these clusters. Example clusters would be "animal-like" or "furry".
The interesting thing about this technique seems to be the combination of these clusters into larger clusters. In other words, it doesn't just learn the categories, it learns the components that make up the categories too.
These experts need to get out more... (Score:2)
Re:Real cool AI (Score:1)
BTW I know it's also "cool" to say C++ but your code is simply "dirty 'C'".
bau
Re:Your sig (Score:1)
object
interface
property
methods
wrapper
function
Hmmm. its a tough one - they all sound bad!
Re:Way Over the Head (Score:1)
Hmmm, another one, eh? Did he get anything right in the end then?
don't I have a right to my opinion ? (Score:1)
You should understand my application: modelling multicomponent sequential chemical reactions and predicting the yields. NNs do very poorly at this. The diffeq's work great because we do have some fundamental understanding of the underlying elementary processes.
With this, we can extrapolate with surprising success. And interpolate, both with surprisingly few parameters (a few dozen for 150 components). I was glad to see your admission the NNs cannot extrapolate. Extrapolation is very important to us, and we do it well! Dumber number crunching methods may well be incapable, so we should avoid them.
Re:Neural Networks (Score:1)
Re:Real cool AI (Score:1)
Product of Experts is not new as far as I can see (Score:1)
Re:Real cool AI (Score:1)
If you feel the need to answer with an explanation then I suppose that somehow you understood my broken (?) English.
Anyhow, thanks for your opinion, Mr.
bau
Re:Real cool AI (Score:1)
AI and C++ (Score:1)
Cool !
Re:Horray for Jesus! (Score:1)
Re:It seems in need of learning (Score:1)
I don't think that was the point that "junkmaster" was trying to make. But there is a certain advantage to using randomly generated sets.
The mathematics of sd shows that by increasing the strengh of the weak models that are "combined", one requires fewer of these weak models to get the same quality of recognition. In fact, there is even an equation that puts limits on this (which I believe is based on Chebyshev's inequality, but I don't remember exactly) and the most commonly used implementation of sd does allow you to set a threshold for the quality of weak models you want chosen from those that are randomly generated.
However, remember that the concept of uniformity is the linchpins in sd. All the models that you choose to "combine" must be uniform, as defined by the theory, with respect to each other and to the problem space. Randomly chosen models tend to make it easier to accomplish this because the fact that they are random already gives them a closer to uniform coverage of the feature space. They just need to be tuned to get true uniformity.
There is nothing theoretically wrong with using stronger weak models to get to your solution faster. But by putting "intelligence" into the process of making the weak models, it would be much, much harder to ensure that they were mathematically uniform. I said it in my original post but I should stress the point that the uniformity concept is the reason that sd beats out all the other methods for "combining" weaker models in standardized benchmarks. The importance of obtaining it cannot be underestimated.
Re:It seems in need of learning (Score:1)
T1000 anyone? (Score:1)
Real cool AI (Score:2)
------
http://vinnland.2y.net/
Oh come on! (Score:2)
For one thing, his PoE model is designed to make each indivdual nets(I think) more capable at responding to a set of features. I don't know if it would intentionally segregate into set animal, set small, and set furry, but it's supposed to be much simpler than the standard supervised network to train.
All it needs to do is get good at sorting images and simplifying the input; a second stage of recognition is then applied to the, theoretically, simpler set of information.
The example you're using is incomplete; his PoE would detect the features small, animal, and furry, where the traditional model would detect the feature cat-like and the feature dog-like, without the sharing of information or neurons that the PoE enables. The second stage of his PoE, the recognition center, would use the sum-product of each of of the simpler feature detectors and then decide if it were cat or dog like.
Geek dating! [bunnyhop.com]
Re:&& == smarter?? (Score:3)
All the more so, since the notion of combining NN experts is already quite old. Haykin mentions it in the 1994 edition of his textbook.
Notice that that's 10% of the way back to the invention of electronic computers, and about 43% of the way back to when the backpropagation algorithm rescued neural networks from obscurity.
--
Re:I sure hope so -- they could hardly be dumber! (Score:3)
Your criticisms are cogent, but in practice NNs can be excellent problem solvers.
For example, I suppose it is possible to solve the pole balancing problem with statistical methods, but I have never seen it done. With neural networks the problem and its more demanding variants have become so trivial that people are losing interest in it as a benchmark. (This is mostly as a result of advances in the art & science of NNs over the last 5 years or so.)
--
Re:I sure hope so -- they could hardly be dumber! (Score:4)
I don't think you can call a NN any dumber than a curve you fit to a graph, since you are just optimizing a function to minimize the error between the observed output and the output of the fit function. A traditional function may give you slightly more insight into a problem, but they can also easily mislead you (eg every function looks like a parabola around a min or max).
My point is that NNs are tools, sometimes effective, sometimes not. They aren't anything close to the AI you read about in scifi books, but nothing is. More interesting actually are support vector machines (svms). There are several papers on the web about them, they were devloped at bell labs by some russian dude whose name escapes me at the moment. They can be more effective than NNs, but the math is a bit harder to understand.
Why, then? (Score:1)
And what about the is operator? The article implied that it was somehow better fit to this task than the identification used in other systems.
Re:I sure hope so -- they could hardly be dumber! (Score:2)
Neural networks are very useful to me, but of course, they don't solve all problems and won't bring peace in the world.
When exceptionally good, they will give reasonable interpolations on the data. But forget about correct extrapolations
Neural networks is a synonym for "nonlinear regression". As for any type of regression, the interpolation and extrapolation performance depends on using the right number of parameters. If you try modeling 1000 points with a 500th degree polynomial, don't expect good extrapolation...
Your sig (Score:1)
--
I had Hinton as a professor at U of T (Score:2)
-Shieldwolf
Re:Way Over the Head (Score:1)
cuz we may be able some day to apply this to National Missile Defence to help discriminate balloons from nukes
Sorry to go off at a tangent, but I'd settle for an AI that can discriminate between "ethical" scientists and "creators of weapons of mass murder" scientists.
yeah... (Score:2)
Re:I sure hope so -- they could hardly be dumber! (Score:2)
I've always felt that the brain's sheer amount of cells was more than enough to account for our perceived intelligence, but not until I took an artificial neural networks class in college did I start to have reasons to back up my feeling.
In that one short quarter, we saw and learned how neurons could be arranged to do the simple non-linear regression you've pointed out. But we also saw self-organizing maps (having the network choose an arbitrary number of categories to categorize your data), and even a semi-infinite memory (storing data and then recalling it with a partial stimulus). We also saw examples of NNs being used to cancel out unknown noise in a signal, convert text-to-speech with only 7 neurons, drive a car in limited conditions with 25 neurons.
After seeing this "tip of the iceberg", I feel more assured feeling that the millions of neurons and connections between them are more than enough to account for our complex behavior we call "intelligence".
combined results (Score:2)
One notable exception is in the research of Stochastic Discrimination (sd). This technique was originally developed through mathematics rather than experimentation as is the case with NNs. In other words, rather than the "let's see what happens if" development of NNs, sd's approach is "the equations say this should happen". Because of this, it is very rigorously defined and the hows and whys are clearly understood.
Sd also "combines" weak models but in a way that, to the best of my knowledge, no one else has done before. The basics are:
But before we "combine" them, we have to see that this weak model is uniform with respect to the other weak models. This is a term defined in the sd theory. Basically, what it means is that the weak models need to be evenly selected throughout the set of all possible weak models. In other words, there is no oversampling or bias. (Actually, this isn't quite right but goes in the right direction. Read one of the papers if you're interested.) The concept of uniformity is probably the most interesting part of sd and it is the primary concept that all the other "combination" techniques miss. In this article, for example, how do we know that there isn't a connection to being furry and being small? If there is a statistical dependency, then the vote won't be fair and results will be weaker.
Anyway, that's a real crash course in sd basics. So how does this algorithm perform? On the standard benchmarks (Irvine, for example) it handily outperforms anything out there. Right out of the box and without tuning. For more information see the web site [buffalo.edu] or send me email.
Re:Real cool AI (Score:1)
Even in "POT" mode, Slashdot converts arrow-brackets to HTML tags. Either use "extrans" mode, or use the HTML ASCII escape codes "<" (less-than) and ">" (greater-than) for maximum convenience.
See you in hell,
Bill Fuckin' Gates®.
Ridiculous claims, please reduce the score (Score:2)
There are other ways to obtain universal approximation (which means that basicaly any regular mapping can be approximately represented). NO method can be used to give correct EXTRApolations. This is not possible. Period.
Now, regarding INTERpolation of the data, Barron as demonstrated in 93 (see Universal approximation bounds for superposition of a sigmoidal function in IEEE transactions on Information Theory volume 39 number 3, pages 930-945) that MLP are more efficient than any other methods. This means that they use LESS parameters than other interpolation methodes (such as spline, kernel regression, etc.).
So, the post I'm answering to is bullshit. I strongly advise posters to read the NN FAQ [sas.com] before posting ridiculous claims.
Re:&& == smarter?? (Score:1)
hang on... crap, you said probably
-
Who teaches the experts? (Score:2)
Hmm. Well maybe I'm missing something, but it seem that the technique merely abstracts the training away from the functional network by one remove. The "Experts" still need training; the system does not appear to "teach" itself anything, but instead relies on the pooled opinion of already trained Experts.
It still seems to be missing the training bootstrap - how to we train ourselves in a system in which we are untrained?
Don't get me wrong; kudos to the researchers. But no brownie points at all to the journalists, slashdot or others, who appear to mistake an adept implementation of a pattern recognition system for something that it is not.
Re:don't I have a right to my opinion ? (Score:1)
BUT, it is clearly true that if you have a parametric model of your problem, you can identify the parameters and use the model to both extra and interpolate your data. Moreover, the result will in general be better than whet you will obtain with a non parametric model. So NN are not a good solution for all problems, especially when you've got a good knowledge of the problem. Your present post should have a good score because it gives valuable information about limitation of NN, whereas your first post was pure rant.
The problem is that you are comparing two different methods. One is parametric and the other is non parametric. NO non parametric model can be use to correctly extrapolate data (regardless wether it is or not based on NN). Among non parametric models, NN have been mathematically demonstrated to be the best ones.
You clearly do not understand fully what you are talking about and the so called statistician you are quoting should give you better explanation about the differences between parametric and non parametric estimation. We are talking about science and there are no opinion here, there are facts.
By the way, my vitriol is a sign I'm pissed of to see your incorrect post has been scored informative whereas it should be scored as a flame. I've got a PhD in maths and my subject was Neural Networks. I work as an assistant professor in a statistics departement. I do know what I'm talking about, but I hate using such diploma arguments.
Re:I sure hope so -- they could hardly be dumber! (Score:2)
The russian dude is named Vladimir Vapnik. He developed SVMs based on something called Statistical Learning Theory (SLT). One of the main insights of SLT regards the convergance of empirical averages.
The idea is that if you want to make a complicated estimate you need more data. For example, you need more data to fit a 10 degree polynomial to your data than you do to fit a 2 degree polynomial. This makes sense intuitively, but SLT provides the mathematical justification for the intuition. Basically SLT says that the estimate for the 2 degree polynomial will converge faster than the estimate for the 10 degree polynomial and it quantifies exactly how fast. Furthermore, SLT provides a way to measure how much data a particular kind of estimator or regression requires.
In the polynomial fitting example, it turns out that the model complexity is equal to the number of free parameters. This is not true in general. For example, NNs can have lots of free parameters but still have a low model complexity. This makes it possible to accuratly train an NN with 50 neurons using much less data than would be required for accuraly fitting a 50 degree polynomial. I think this is part of the reasons NNs do well in practice: NNs can represent large classes of models but training NNs converges better with less data because they have low model complexity in the sense of SLT.
The idea behind SVMs is to build a structure which is rich enough to represent a large class of relationships but constrained enough so that you can accuratly train it with limitied data. Thus SVMs trade off richness with accuracy based on the data. If you have only a little bit of data you use a structre which is constrained enough to be trained accuratly but might not model subtle relationships. Once you get more data then you can use richer structures and still maintain training accuracy. Its really a pretty nice idea.
Anyway, the point is that NNs do seem to work well in practice and Vapnik's work might help explain this. If you are interested in NNs or similiar things, I would suggest checking out SLT and SVMs. Unfortuantly both SLT and SVMs are pretty new so the math is still a little ugly.
This makes sence (Score:2)
Re:Real cool AI (Score:1)
Re:Product of Experts is not new as far as I can s (Score:1)
(2) I reviewed quickly his initial paper, and then realised, oh that Hinton
(3) That being said, his work is definitely of interest - although I doubt that it is a huge breakthrough in AI as being suggested. If it is similar to the work done in the ML community, then basically what you get is a nice way to integrate differents Points of View's on the same situation - which helps overcome the tendency for ML algorthims to suffer from local minima and being sensitive to the actuall distrubutions of the data.
(4) I couldnt find any comparisons to work outside of the NN field.
Winton
society of mind (Score:2)
the concept of semi-intelligent agents, acting together to perform what appear to be intelligent tasks is the main theme running through minsky's book society of mind, a great read, highly recommended.
to minsky, our intelligence is the product of millions of agents that autonomously perform various tasks and send messages to one another. what i found interesting about this article was that the ideas of the classical AI (minsky, et al.) are morphing with the "modern" AI. this is cool because when i studied artificial intelligence in college, i got the impression that there was a holy war between the two AI camps. it's nice to see the convergence..
-mikeFurry Things (Score:1)
Neural Networks (Score:2)
&& == smarter?? (Score:1)
if(animal is furry && animal is small && animal is domesticated) it's probably a cat or a dog.
Congratulations, you've reinvented the and gate, which those of us outside the neural network community have been using for a long time.
The Neural Networks Got Smarter! (Score:1)