Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Science

UK Researchers Make Neural Networks Smarter 56

Small Hairy Troll writes: "EDTN is running this story concerning a researcher in the UK who has come up with a method for getting those pesky Neural Nets to teach themselves to see. Called the 'Product of Experts,' the Neural Net is built using 'Experts.' If "you had one expert that preferred furry animals, whereas another expert preferred domesticated animals and another preferred small animals, their votes ... would light up dogs and cats very nicely." And an Edinburgh professor is quoted in the story as calling it "the first neural-network architecture that is both sensibly implementable and worth implementing."
This discussion has been archived. No new comments can be posted.

UK Researchers Make Neural Networks Smarter

Comments Filter:
  • Marvin Minsky is a moron. In the late 60s he said that Neural Networks have no future and singlehandedly set back the AI field 20 years because everyone trusted him, until people introduced the Backpropagation algorithm and started research into neural nets again in spite of him. By the way, here's [cornell.edu] my implementation of a generic neural net using backpropagation (150 lines of C++).

  • by Anonymous Coward
    Actualy if anyone is REALLY interested in AI and not just another 15 year old Matrix fan, they shuld stay away from those books and get hold of something from Kyoto labs or Kloksin & melish or something
  • by Black Parrot ( 19622 ) on Thursday December 21, 2000 @09:06PM (#544107)
    > Question: Seeing the words 'biologically valid' conjures up an image of scientists pursuing pure science rather than concentrating on the applications of it. Is the goal of NN today more theoretical (we want to get something to behave more like a smart being) than practical (we want something that will specifically put names to faces/discriminate balloons from weapons/identify handwriting like an expert).

    Both.

    Cognitive scientists are using NN technology as a 'biologically valid' model for cognition. (Though only a fool would remain unaware of the enormous gap between our NN toys and the real thing, and of the enormous simplification that goes into our toys.)

    Others just look at NN as a technology to be exploited without reference to biology.

    > I suspect that this field has narrowed in the last decade (but I may be wrong), and so I fear that it may be getting wayyy esoteric.

    Wayyyyy. Like any other branch of science, especially CS, this field is rapidly "narrowing" in the sense of getting deeper, but also "broadening" in the sense of developing more branches and more connections to other fields. (E.g., lots of parallels have been shown between NN and physics, and between NN and statistics.)

    > As a practical engineer who needs solutions today, should I devote more energies to this or less? What is happening elsewhere in the field?

    It's no longer possible even for NN researchers to stay on top of everything that's going on in the field, so don't even think about investing that much time in it.

    Beyond that, what's your field of application in engineering? Do your journals ever cover relevant NN technology? If not, you might be able to start a SIG, so that the effort of keeping an ear to the ground and filtering out the uninteresting material could be spread among the members, rather than going it solo.

    --
  • Wow... when I worked in AI in college my thesis was "Heuristic Reasoning for Stress-Strain Finite Element Generation" (hey, if I don't toot my own horn...), and when I briefed my findings to one of my proctors he started grumbling and the other teacher said "Marvin, what's wrong. He's describing fuzzy logic." To which he replied, "Fuzzy Logic I understand, it's fuzzy explanations I don't get!"

    I guess it's hard to explain the field of AI to outsiders, because even though I (used to) understand neural networks, I feel that I need some sort of a diagram to 'get it' here. But what I think I heard is that "learning is the big problem in NN, or maybe a better word would be teaching" ... "we have reduced the learning time by coming up with a new conceptual way of classifying data" ... "the new way is more 'biologically valid'"...

    Okay, I do have a question for the experts out there. Help would be greatly appreciated 'cuz we may be able some day to apply this to National Missile Defence to help discriminate balloons from nukes, currently a big problem.

    Question: Seeing the words 'biologically valid' conjures up an image of scientists pursuing pure science rather than concentrating on the applications of it. Is the goal of NN today more theoretical (we want to get something to behave more like a smart being) than practical (we want something that will specifically put names to faces/discriminate balloons from weapons/identify handwriting like an expert).

    I suspect that this field has narrowed in the last decade (but I may be wrong), and so I fear that it may be getting wayyy esoteric. As a practical engineer who needs solutions today, should I devote more energies to this or less? What is happening elsewhere in the field?

  • Can someone who knows more about Neural Net's help me out?

    This doesn't seem like much of a break-through. The article mentions Back-propagation networks as an effective (but as I understand it, slow) NN. The article says that this technique is an improvement on back-props. I do know that in the last year, Quick-props and Fast-props have both came out which are 'major' improvements on the back-props algorithm.

    My question for the specialist is, how is the PoE model any different/better/worse than these other improvements? It is my understanding that Quick-props is very good for practical image recognition problems.

  • You may be right when you have no idea about the relationships in the data, and just want the NN to "work". But a big correlation is not my idea of "smart".

    When I fit a curve to some data, it's because I have a certain fundamental understanding about that data and how it should behave. Only an idiot would try to fit a parabola to an Arrhenius curve.

    AFAIK, no NN gives this insight. It is therefore "dumb" in both senses of the word.

  • I work with statisticians and modellers, and they have an extremely low opinion of neural nets.

    As described to me, neural nets are _huge_ (every datapoint is in) underconstrained matricies with an infinite number of equally valid solutions. "Training" [programming] them is an exercise to find a strategy for the "best" solution.

    Practically, when NNs are well done they will give you back the data you fed into them! When exceptionally good, they will give reasonable interpolations on the data. But forget about correct extrapolations.

  • if(animal is furry && animal is small && animal is domesticated) it's probably a cat or a dog.

    Problem is you have to hardcode in or read in what is furry and what is small and what is domesticated, whereas the nn figures out what these features are and adapts to changes in the input automatically. Otherwise you have some human extracting new features and hardcoding them somewhere.
  • by MrGrendel ( 119863 ) on Thursday December 21, 2000 @08:37PM (#544113)
    This appears to be similar to the technique used by Hopfield's Mus Silicium [slashdot.org] neural net speech recognition contest. The solution [nyu.edu] ended up being that recognition occurs when a large number of neurons connected to the same output neuron 'synchronize' and fire at about the same time. The big difference between these approaches seems to be that Hopfield is using spiking neurons and these guys are using some form of back propagation to train smaller networks that have to agree on what some data set represents in order to return a positive result.
  • by TimoT ( 67567 ) on Thursday December 21, 2000 @11:12PM (#544114) Homepage

    Well, this is partially true. At least the MLP neural network is just a non-linear function approximator, that can, in theory, learn any mapping from the inputs to the outputs. The network is trained using points from the input space together with the desired output. Statisticians would probably call MLP non-linear regression. I also know some statistics profs who have a rather high regard for neural networks. The name has a lot of hype, but the methodology works. About correct extrapolations: I would like to see anything correctly extrapolate in a general case given only a few random observed points and desired output. The performance of these methods depends on how well the assumptions of the method correspond to the way the observed data really behaves and you cannot tell that with certainty from a few random points.

    The training is usually done using two separate sets of data: a training set and a test set. The training set is used to train the network and the test set is used to test the performance. When the performance of the network starts to degrade instead of improving, then the network is starting to overfit and loses it's ability to generalize.

    Basically you can just think of the MLP neural network as a classifier (when your goal is to classify, which is usually the case with neural network) that draws arbitrary boundaries in n dimensional space. Somehow I always think of those blobby objects in computer graphics, when I think of the classification boundaries, but this is of course not strictly correct. I guess it still helps.

  • it's a reasonable interpolation and speaking of your own comment...

    >You could've come off as someone who was
    >interested and wanted to know more. But no. You >had to make a snide ass remark. Good one, bucko, >we now all know what a dumbass you really are.

    'nough said...
  • by adubey ( 82183 ) on Thursday December 21, 2000 @11:21PM (#544116)
    Hi,

    I'm an AI researcher, and I'll tell you that you're patently wrong ;)

    One thing first: Hinton (the inventor of PoE) was one of the people re-popularizing NN's way back in the early 80's while at U of T.

    Now, others have tried combining experts before. But Hinton's approach beats them empirically. The reason is that the experts are trained together in PoE rather than being trained seperately and considered jointly for evaluation, as in previous approaches...

    Another thing: the NN's here are _very_ different from backprop NN's. The entire topology is different. Backprop neural networks are a special case of Bayesian nets, but PoE is based upon random fields. While there is current research being done on training random fields from bayes nets, there are in fact things you can represent with a random field that can't be represented with bayes net (some of Minsky and Papert's proofs for perceptrions in fact be extended to prove limitations on backprop nets and bayes nets). The converse is also true: there are distributions that can be represented in Bayes nets and not in random fields. (But of course, both classes of distributions can be represented as factor graphs...)
  • I remember seeing a video of Apple working on this about 5 years ago. Part of the original 'vision' behind the Newton was based on this. Needless to say it hasn't worked yet...
  • Backpropagation is a supervised learning method. That means you have to tell it "right" or "wrong, it was this..." in order to get a backprop net to learn.

    This technique is unsupervised, which means it figures out different categories (the number depends on learning parameters and the data presented) and clusters the inputs into one or more of these clusters. Example clusters would be "animal-like" or "furry".

    The interesting thing about this technique seems to be the combination of these clusters into larger clusters. In other words, it doesn't just learn the categories, it learns the components that make up the categories too.


  • If "you had one expert that preferred furry animals, whereas another expert preferred domesticated animals and another preferred small animals, their votes ... would light up dogs and cats very nicely."

    God, wait 'til the animal rights' activists get wind of this - animals being used to satisfy artificial intelligence experts' sexual urges!


    D.

  • I know it makes "cool" talk smack of famous people. But it's a bit exaggerate to call someone moron so promptly.
    BTW I know it's also "cool" to say C++ but your code is simply "dirty 'C'".

    bau
  • Choose your weapon:
    object
    interface
    property
    methods
    wrapper
    function

    Hmmm. its a tough one - they all sound bad! :)
  • "Come the millennium, month 12... The village idiot will come forth To be acclaimed the leader. - Nostradamus"

    Hmmm, another one, eh? Did he get anything right in the end then?
  • We clearly disagree. I take your vitriol as a sign you feel insecure in your position.

    You should understand my application: modelling multicomponent sequential chemical reactions and predicting the yields. NNs do very poorly at this. The diffeq's work great because we do have some fundamental understanding of the underlying elementary processes.

    With this, we can extrapolate with surprising success. And interpolate, both with surprisingly few parameters (a few dozen for 150 components). I was glad to see your admission the NNs cannot extrapolate. Extrapolation is very important to us, and we do it well! Dumber number crunching methods may well be incapable, so we should avoid them.

  • Oooh! =) Could it be the driving force behind the next generation scour.net?? Old sk00l pr0n search engines would be obsolete!
  • Yeah I know, if you just use arrays it can be much shorter, most of it is test code anyway. I wasn't bragging or anything. There are 3 includes in the program, which one was missing?

  • This stuff has been done before under the heading "Ensembles" and "Combining Multiple Models", various othe forms of communities of experts. This is also sometimes referred to as "Bagging" and "Boosting" as well. I dont have my references to hand, but if someone wants them I'll dig em out. I can reference papers going back to the early 90's if you'd like :) Check out EWSL-91 I think. I might be missing some technical details that they've developed, and it would be extremely unfair to denigrate this research without reading the papers, but it does nae sound revolutionary to me. Given the person being quoted (EE), it doesnt sound like they come from the Machine Learning community at all, otherwise they might know this literature! Winton p.s. It's 3 am, been playing Myth 2 for a couple of hours, and a couple of beers, so I can't deal with the hassle of looking this stuff up with URL's etc...
  • How's my post not in English ?
    If you feel the need to answer with an explanation then I suppose that somehow you understood my broken (?) English.
    Anyhow, thanks for your opinion, Mr. ... ?

    bau
  • I said it's C++ because it won't compile as C. You are right, it's not object-oriented though. Why does it make "cool" to say C++? I hate C++ and I like C better, it's just that I was using Visual Studio for this and it will compile anything that resembles C or C++, probably even my grandma's cake recipe. Oh, and I'm not saying this code is the pinnacle of AI programming, it's just a quick&dirty implementation I used to try something.

  • It makes cool to say C++ because one doesn't sound like a bigot (to the eyes of those newbies 8). But from your answer it doesn't seem like you were trying to be cool.
    Cool !
  • My -GOD- man! That's one of the most disturbing things I've yet seen. You slimy bastard.
  • For a given problem there will be a set (or sets) of models (given some restraints) which will give an optimal solution... are you saying its provable there is no faster method of determining those sets than with the random generation method?

    I don't think that was the point that "junkmaster" was trying to make. But there is a certain advantage to using randomly generated sets.

    The mathematics of sd shows that by increasing the strengh of the weak models that are "combined", one requires fewer of these weak models to get the same quality of recognition. In fact, there is even an equation that puts limits on this (which I believe is based on Chebyshev's inequality, but I don't remember exactly) and the most commonly used implementation of sd does allow you to set a threshold for the quality of weak models you want chosen from those that are randomly generated.

    However, remember that the concept of uniformity is the linchpins in sd. All the models that you choose to "combine" must be uniform, as defined by the theory, with respect to each other and to the problem space. Randomly chosen models tend to make it easier to accomplish this because the fact that they are random already gives them a closer to uniform coverage of the feature space. They just need to be tuned to get true uniformity.

    There is nothing theoretically wrong with using stronger weak models to get to your solution faster. But by putting "intelligence" into the process of making the weak models, it would be much, much harder to ensure that they were mathematically uniform. I said it in my original post but I should stress the point that the uniformity concept is the reason that sd beats out all the other methods for "combining" weaker models in standardized benchmarks. The importance of obtaining it cannot be underestimated.

  • From your description I would think the method would be even more valuable if it had a better way than testing randomly generated models to find an optimal set. Hinton has proposed methods to dependently train his experts, which to me seems a very desirable property. Using randomly generated models is actually a strength of SD [buffalo.edu], the "proof which this margin is too small to contain." <grin>
  • Does anyone smell a zz.. Or, more like, a Terminator? On a serious note, this will have tremendous impact in the target acquisition sector; I'm sure people at Raytheon and Lockheed are drooling right now (rather, they're probably finishing up timing closure on new multimillion gate ASICs that implement just this type of stuff).
  • If anyone is really interested in getting into AI I highly suggest reading anything from singinst.org, or The Society of Mind by Marvin Minsky, and also The Age of Spiritual Machines by Ray Kurzweil :)

    ------
    http://vinnland.2y.net/
  • Its not nearly so simple as a logical and.

    For one thing, his PoE model is designed to make each indivdual nets(I think) more capable at responding to a set of features. I don't know if it would intentionally segregate into set animal, set small, and set furry, but it's supposed to be much simpler than the standard supervised network to train.

    All it needs to do is get good at sorting images and simplifying the input; a second stage of recognition is then applied to the, theoretically, simpler set of information.

    The example you're using is incomplete; his PoE would detect the features small, animal, and furry, where the traditional model would detect the feature cat-like and the feature dog-like, without the sharing of information or neurons that the PoE enables. The second stage of his PoE, the recognition center, would use the sum-product of each of of the simpler feature detectors and then decide if it were cat or dog like.

    Geek dating! [bunnyhop.com]
  • by Black Parrot ( 19622 ) on Thursday December 21, 2000 @08:43PM (#544136)
    > The article was rather light on details, but this doesn't look like much of a breakthrough.

    All the more so, since the notion of combining NN experts is already quite old. Haykin mentions it in the 1994 edition of his textbook.

    Notice that that's 10% of the way back to the invention of electronic computers, and about 43% of the way back to when the backpropagation algorithm rescued neural networks from obscurity.

    --
  • by Black Parrot ( 19622 ) on Thursday December 21, 2000 @08:51PM (#544137)
    > As described to me, neural nets are _huge_ (every datapoint is in) underconstrained matricies with an infinite number of equally valid solutions.

    Your criticisms are cogent, but in practice NNs can be excellent problem solvers.

    For example, I suppose it is possible to solve the pole balancing problem with statistical methods, but I have never seen it done. With neural networks the problem and its more demanding variants have become so trivial that people are losing interest in it as a benchmark. (This is mostly as a result of advances in the art & science of NNs over the last 5 years or so.)

    --
  • by FigWig ( 10981 ) on Thursday December 21, 2000 @08:53PM (#544138) Homepage
    You can think of a NN as a non-linear function which is modelled after nueral connections. The inputs are weight vectors determining the strength of the connections between nodes, and a test vector with length equal to the number of nodes in the first layer. The output is a vector of length equal to the number of nodes in the final layer. Backpropagation is a technique that will optimize the link weights in order to minimize the error function - usually one half the sum of the square of the difference between network output and intended output (makes the derivative look nicer).

    I don't think you can call a NN any dumber than a curve you fit to a graph, since you are just optimizing a function to minimize the error between the observed output and the output of the fit function. A traditional function may give you slightly more insight into a problem, but they can also easily mislead you (eg every function looks like a parabola around a min or max).

    My point is that NNs are tools, sometimes effective, sometimes not. They aren't anything close to the AI you read about in scifi books, but nothing is. More interesting actually are support vector machines (svms). There are several papers on the web about them, they were devloped at bell labs by some russian dude whose name escapes me at the moment. They can be more effective than NNs, but the math is a bit harder to understand.

  • You're right. Ok, here's a question for you. I don't know neural nets, but I'm smart enough to recognize an and gate when I see one. What makes this more special than my trivial pseudocode example?

    And what about the is operator? The article implied that it was somehow better fit to this task than the identification used in other systems.

  • I work with statisticians and modellers, and they have an extremely low opinion of neural nets

    Neural networks are very useful to me, but of course, they don't solve all problems and won't bring peace in the world.

    When exceptionally good, they will give reasonable interpolations on the data. But forget about correct extrapolations

    Neural networks is a synonym for "nonlinear regression". As for any type of regression, the interpolation and extrapolation performance depends on using the right number of parameters. If you try modeling 1000 points with a 500th degree polynomial, don't expect good extrapolation...
  • Isn't it time men stopped thinking of women as devices?
    --
  • He has been working on various algorithms relating to Neural Nets, including the wake-sleep algorithm. Read the book "Unsupervised Learning: Foundations of Neural Computation" which he co-edited for some insights and relevant reasearch papers on the topic.

    -Shieldwolf
  • cuz we may be able some day to apply this to National Missile Defence to help discriminate balloons from nukes

    Sorry to go off at a tangent, but I'd settle for an AI that can discriminate between "ethical" scientists and "creators of weapons of mass murder" scientists.

  • by hugg ( 22953 )
    Neural network supervised learning experts blah blah... just give me a robot that can vacuum dammit! And not suck up my headphone cable in the process, that's the tricky part...
  • ...an extremely low opinion of neural nets.

    I've always felt that the brain's sheer amount of cells was more than enough to account for our perceived intelligence, but not until I took an artificial neural networks class in college did I start to have reasons to back up my feeling.

    In that one short quarter, we saw and learned how neurons could be arranged to do the simple non-linear regression you've pointed out. But we also saw self-organizing maps (having the network choose an arbitrary number of categories to categorize your data), and even a semi-infinite memory (storing data and then recalling it with a partial stimulus). We also saw examples of NNs being used to cancel out unknown noise in a signal, convert text-to-speech with only 7 neurons, drive a car in limited conditions with 25 neurons.

    After seeing this "tip of the iceberg", I feel more assured feeling that the millions of neurons and connections between them are more than enough to account for our complex behavior we call "intelligence".

  • As already mentioned, the concept of "combining" the results of pattern recognition models is not new and there have been various techniques for doing this. Most of these techniques have been ad hoc with very little rigorous mathematical foundation.

    One notable exception is in the research of Stochastic Discrimination (sd). This technique was originally developed through mathematics rather than experimentation as is the case with NNs. In other words, rather than the "let's see what happens if" development of NNs, sd's approach is "the equations say this should happen". Because of this, it is very rigorously defined and the hows and whys are clearly understood.

    Sd also "combines" weak models but in a way that, to the best of my knowledge, no one else has done before. The basics are:

    1. Incredibly weak models are generated to solve the given problem.
    2. Hundreds of thousands of these models are combined.
    3. These weak models must be uniform with respect to each other and to the problem space.
    For example, rather than combining a few very specific models as described in this article (one for furry animals, one for domestic, etc), sd would randomly generate hundreds of thousands of weak models to solve the problem. Each of these models would look at a different set of features but there would be so many combinations that you wouldn't be able to name them. For example, maybe one model would learn to distinguish based on the length of tail, the color of snout, and diameter of the third toenail. This model obviously can't be named. The set of features it looks at are too odd. But if we note that it has some trivially weak ability to tell the difference between a dog and a cat then we accept it. For the problem of dogs vs cat, we may only require that any given model be 50.1% accurate on our training set. When we "combine" all these weak models, a strong solution emerges. Why this happens has its roots in the Central Limit Theorem.

    But before we "combine" them, we have to see that this weak model is uniform with respect to the other weak models. This is a term defined in the sd theory. Basically, what it means is that the weak models need to be evenly selected throughout the set of all possible weak models. In other words, there is no oversampling or bias. (Actually, this isn't quite right but goes in the right direction. Read one of the papers if you're interested.) The concept of uniformity is probably the most interesting part of sd and it is the primary concept that all the other "combination" techniques miss. In this article, for example, how do we know that there isn't a connection to being furry and being small? If there is a statistical dependency, then the vote won't be fair and results will be weaker.

    Anyway, that's a real crash course in sd basics. So how does this algorithm perform? On the standard benchmarks (Irvine, for example) it handily outperforms anything out there. Right out of the box and without tuning. For more information see the web site [buffalo.edu] or send me email.

  • I assume you were trying to enter something like this:
    #include <whatever>

    Even in "POT" mode, Slashdot converts arrow-brackets to HTML tags. Either use "extrans" mode, or use the HTML ASCII escape codes "&lt;" (less-than) and "&gt;" (greater-than) for maximum convenience.


    See you in hell,
    Bill Fuckin' Gates®.

  • This this the most ridiculous post I've ever read. As many replies have already explained, a Multi Layer Perceptron (one kind of NN) is an universal approximator, that is it can be used to model a mapping from one set to another, thanks to the data (a set of input-ouput associations).

    There are other ways to obtain universal approximation (which means that basicaly any regular mapping can be approximately represented). NO method can be used to give correct EXTRApolations. This is not possible. Period.

    Now, regarding INTERpolation of the data, Barron as demonstrated in 93 (see Universal approximation bounds for superposition of a sigmoidal function in IEEE transactions on Information Theory volume 39 number 3, pages 930-945) that MLP are more efficient than any other methods. This means that they use LESS parameters than other interpolation methodes (such as spline, kernel regression, etc.).

    So, the post I'm answering to is bullshit. I strongly advise posters to read the NN FAQ [sas.com] before posting ridiculous claims.
  • yeah but what about my hairy baby alligator....., that falls into your catagory.......

    hang on... crap, you said probably :(
    -
  • "a method for getting those pesky Neural Nets to teach themselves to see"

    Hmm. Well maybe I'm missing something, but it seem that the technique merely abstracts the training away from the functional network by one remove. The "Experts" still need training; the system does not appear to "teach" itself anything, but instead relies on the pooled opinion of already trained Experts.

    It still seems to be missing the training bootstrap - how to we train ourselves in a system in which we are untrained?

    Don't get me wrong; kudos to the researchers. But no brownie points at all to the journalists, slashdot or others, who appear to mistake an adept implementation of a pattern recognition system for something that it is not.

  • You're not expressing an opinion, but making false claims. I'm talking about a theorem, not about ONE experiment.

    BUT, it is clearly true that if you have a parametric model of your problem, you can identify the parameters and use the model to both extra and interpolate your data. Moreover, the result will in general be better than whet you will obtain with a non parametric model. So NN are not a good solution for all problems, especially when you've got a good knowledge of the problem. Your present post should have a good score because it gives valuable information about limitation of NN, whereas your first post was pure rant.

    The problem is that you are comparing two different methods. One is parametric and the other is non parametric. NO non parametric model can be use to correctly extrapolate data (regardless wether it is or not based on NN). Among non parametric models, NN have been mathematically demonstrated to be the best ones.

    You clearly do not understand fully what you are talking about and the so called statistician you are quoting should give you better explanation about the differences between parametric and non parametric estimation. We are talking about science and there are no opinion here, there are facts.

    By the way, my vitriol is a sign I'm pissed of to see your incorrect post has been scored informative whereas it should be scored as a flame. I've got a PhD in maths and my subject was Neural Networks. I work as an assistant professor in a statistics departement. I do know what I'm talking about, but I hate using such diploma arguments.
  • More interesting actually are support vector machines (svms). There are several papers on the web about them, they were devloped at bell labs by some russian dude whose name escapes me at the moment. They can be more effective than NNs, but the math is a bit harder to understand.

    The russian dude is named Vladimir Vapnik. He developed SVMs based on something called Statistical Learning Theory (SLT). One of the main insights of SLT regards the convergance of empirical averages.

    The idea is that if you want to make a complicated estimate you need more data. For example, you need more data to fit a 10 degree polynomial to your data than you do to fit a 2 degree polynomial. This makes sense intuitively, but SLT provides the mathematical justification for the intuition. Basically SLT says that the estimate for the 2 degree polynomial will converge faster than the estimate for the 10 degree polynomial and it quantifies exactly how fast. Furthermore, SLT provides a way to measure how much data a particular kind of estimator or regression requires.

    In the polynomial fitting example, it turns out that the model complexity is equal to the number of free parameters. This is not true in general. For example, NNs can have lots of free parameters but still have a low model complexity. This makes it possible to accuratly train an NN with 50 neurons using much less data than would be required for accuraly fitting a 50 degree polynomial. I think this is part of the reasons NNs do well in practice: NNs can represent large classes of models but training NNs converges better with less data because they have low model complexity in the sense of SLT.

    The idea behind SVMs is to build a structure which is rich enough to represent a large class of relationships but constrained enough so that you can accuratly train it with limitied data. Thus SVMs trade off richness with accuracy based on the data. If you have only a little bit of data you use a structre which is constrained enough to be trained accuratly but might not model subtle relationships. Once you get more data then you can use richer structures and still maintain training accuracy. Its really a pretty nice idea.

    Anyway, the point is that NNs do seem to work well in practice and Vapnik's work might help explain this. If you are interested in NNs or similiar things, I would suggest checking out SLT and SVMs. Unfortuantly both SLT and SVMs are pretty new so the math is still a little ugly.

  • I remember learning in Psych 207 (Cognition and Memory) that cats have been shown to have such "experts" in their brains. In particular, they have one for detection of horizontal lines, emabling them to get a good understanding of where a ledge is. You can stunt the growth of these experts by removing that type of stimulus at an early age. Placing kittens in a round room with vertical bars on the wall and allowing them to grow up there will effectively remove their ability to jump up onto a ledge.
  • Marvin Minsky! A name to conjur with! is he wearing virtual cyber computer clothes yet? Perhaps he should get together with Kevin and do a double act?!
  • (1) Someone please Moderate the previous reply up to being interesting /useful.

    (2) I reviewed quickly his initial paper, and then realised, oh that Hinton :) ! However, in the PoE paper there is absolutely no references to the work done in the Multi-Agent & Machine Learning communities.

    (3) That being said, his work is definitely of interest - although I doubt that it is a huge breakthrough in AI as being suggested. If it is similar to the work done in the ML community, then basically what you get is a nice way to integrate differents Points of View's on the same situation - which helps overcome the tendency for ML algorthims to suffer from local minima and being sensitive to the actuall distrubutions of the data.

    (4) I couldnt find any comparisons to work outside of the NN field.

    Winton

  • the concept of semi-intelligent agents, acting together to perform what appear to be intelligent tasks is the main theme running through minsky's book society of mind, a great read, highly recommended.

    to minsky, our intelligence is the product of millions of agents that autonomously perform various tasks and send messages to one another. what i found interesting about this article was that the ideas of the classical AI (minsky, et al.) are morphing with the "modern" AI. this is cool because when i studied artificial intelligence in college, i got the impression that there was a holy war between the two AI camps. it's nice to see the convergence..

    -mike
  • Speaking of Fuzzy Logic...
  • This seems to be an interesting concept. If a neural network could be taught one's preferences, one's personality even, wouldn't it make an excellent agent? A little bot that could go do a lot of menial shit we loathe doing. The idea has been proposed, but would these be quick enough for the job? And if they were, would it be overkill?
  • The article was rather light on details, but this doesn't look like much of a breakthrough. One expert likes furry animals, another small ones, another domesticated ones. So the neural network combines the results and pumps out cats and dogs.

    if(animal is furry && animal is small && animal is domesticated) it's probably a cat or a dog.

    Congratulations, you've reinvented the and gate, which those of us outside the neural network community have been using for a long time.

  • "The neural networks got smah-tah!" Now they can swim backwards and eat Samuel L. Jackson and stuff

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...