Forgot your password?
typodupeerror
AI Science

A.I. Advances Through Deep Learning 162

Posted by Soulskill
from the skip-the-lesson-on-killing-all-humans dept.
An anonymous reader sends this excerpt from the NY Times: "Advances in an artificial intelligence technology that can recognize patterns offer the possibility of machines that perform human activities like seeing, listening and thinking. ... But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just 'neural nets' for their resemblance to the neural connections in the brain. 'There has been a number of stunning new results with deep-learning methods,' said Yann LeCun, a computer scientist at New York University who did pioneering research in handwriting recognition at Bell Laboratories. 'The kind of jump we are seeing in the accuracy of these systems is very rare indeed.' Artificial intelligence researchers are acutely aware of the dangers of being overly optimistic. ... But recent achievements have impressed a wide spectrum of computer experts. In October, for example, a team of graduate students studying with the University of Toronto computer scientist Geoffrey E. Hinton won the top prize in a contest sponsored by Merck to design software to help find molecules that might lead to new drugs. From a data set describing the chemical structure of 15 different molecules, they used deep-learning software to determine which molecule was most likely to be an effective drug agent."
This discussion has been archived. No new comments can be posted.

A.I. Advances Through Deep Learning

Comments Filter:
  • by drooling-dog (189103) on Sunday November 25, 2012 @12:44AM (#42085161)

    I wonder how much of these improvements in accuracy are due to fundamental advances, vs. the capacity of available hardware to implement larger models and (especially?) the availability of vastly larger and better training sets...

    • Re: (Score:3, Informative)

      by PlusFiveTroll (754249)

      from TFA

      " Modern artificial neural networks are composed of an array of software components, divided into inputs, hidden layers and outputs. The arrays can be “trained” by repeated exposures to recognize patterns like images or sounds.

      These techniques, aided by the growing speed and power of modern computers, have led to rapid improvements in speech recognition, drug discovery and computer vision. "

      Sounds like both.

      • by iggymanz (596061) on Sunday November 25, 2012 @01:27AM (#42085301)

        no, that first sentence pretty much sums up digital neural nets over two decades ago. So more likely the over two orders magnitude processing power per chip improvement since then, with addressable memory over three orders magnitude bigger....

        • Re: (Score:2, Interesting)

          by Anonymous Coward

          The way they are trained is very different, and it's this change that improves the performance. It's more than just making them faster, a fast idiot is still an idiot.

      • by tirerim (1108567)

        from TFA

        " Modern artificial neural networks are composed of an array of software components, divided into inputs, hidden layers and outputs. The arrays can be “trained” by repeated exposures to recognize patterns like images or sounds.

        These techniques, aided by the growing speed and power of modern computers, have led to rapid improvements in speech recognition, drug discovery and computer vision. "

        Sounds like both.

        Well, that doesn't say anything; that just described every neural network for the past couple of decades, except for the "rapid improvement" part. I haven't read TFA, so I don't know if there's more detail, but just describing the basics of how neural networks operate isn't an explanation for why they're suddenly improving.

        • by Prof.Phreak (584152) on Sunday November 25, 2012 @01:57AM (#42085385) Homepage

          The ``new'' (e.g. last decade or so) advances are in training hidden layers of neural networks. Kinda like peeling an onion, each layer getting progressively coarser representation of the problem. e.g. if you have 1000000 inputs, and after a few layers, only have 100 hidden nodes, those 100 nodes are in essence representing all the ``important'' (some benchmark you choose) information of those 1000000 inputs.

        • by PlusFiveTroll (754249) on Sunday November 25, 2012 @02:15AM (#42085415) Homepage

          Article didn't say, but if I had to make a guess, this is where I would start.

          http://www.neurdon.com/2010/10/27/biologically-realistic-neural-models-on-gpu/ [neurdon.com]
          "The maximal speedup of GPU implementation over dual CPU implementation was 41-fold for the network size of 15000 neurons."

          This was done on cards 7 years old now. The massive increase of power in GPUs in the past few years along with more features and better programing languages for them means the performance increase could possibly be many hundreds of times faster. An entire cluster of servers gets crunched down in to one card, multiple cards in one server, and build a cluster of those and you can quickly see that amount of computing power available to neural networks is much much larger now. I'm not even sure how to compare the GT6800 to a modern GTX680 because of their huge differences, but the 6800 did 54 FLOPs and the 680 does 3090.4. A 57x increase. CPU's how far back to we have to go where CPUs are 57 times slower. If everything scales the same in the papers calculations it would mean over a 2000x performance increase on a single computer with 1 GPU. In 7 years.

    • by xtal (49134)

      Computers have gotten very cheap. Pretty much any prof that wants to pursue something now can build enough hardware to do so with a relatively small amount of money. Neural networks ran into a big wall twenty years ago because the tools weren't there yet.

      Once people start having some successes, more funds will be made available, more advances will be made, justifyiing even more funding.. and then we'll turn control of the military over to SkyNet. :)

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        Don't forget that it's not impossible to build a specially designed processor to do a particular task; such as the digital orrery. Such devices created to do nothing but neural net simulations would be more efficient than using a general purpose computer. It would be linked to such to provide a convenient interface but do most of the heavy lifting itself.

      • by mikael (484)

        Everything ran into a big wall 20 years ago. There were 680x0, DEC Alpha and SPARC systems, but they were either $10,000 workstations (with no disk drive, server or monitor for the price) or there were embedded systems requiring a rack chassis development kit (manuals cost extra).

        Image processing on a PC CPU (= 80386) had to be implemented as a script of image processing command line functions as it wasn't even possible to reliably allocate more than one 64K block. You would load the image in line by line,

    • by michaelmalak (91262) <michael@michaelmalak.com> on Sunday November 25, 2012 @01:06AM (#42085237) Homepage

      I wonder how much of these improvements in accuracy are due to fundamental advances

      I was wondering the same thing, and just now found this interview [kaggle.com] on Google. Perhaps someone can fill in the details.

      But basically, machine learning is at its heart hill-climbing on a multi-dimensional landscape, with various tricks thrown in to avoid local maxima. Usually, humans detemine the dimensions to search on -- these are called the "features". Well, philosophically, everything is ultimately created by humans because humans built the computers, but the holy grail is to minimize human invovlement -- "unsupervised learning". According to the interview, this one particular team (the one mentioned at the end of the Slashdot summary) actually rode the bicycle with no hands and to demonstrate how strong their neural network was at determining its own features, did not guide it, even though it meant their also-excellent conventional machine learning at the end of the process would be handicapped.

      The last time I looked at neural networks was circa 1990, so perhaps someone writing to an audience more technically literate than the New York Times general audience could fill in the details for us on how a neural network can create features.

      • by Daniel Dvorkin (106857) on Sunday November 25, 2012 @02:01AM (#42085395) Homepage Journal

        the holy grail is to minimize human invovlement -- "unsupervised learning"

        Unsupervised learning is valuable, but calling it a "holy grail" is going a little too far. Supervised, unsupervised, and semi-supervised learning are all active areas of research.

      • by mbkennel (97636)

        There is a new thing. It has long been known that "deep networks" could theoretically represent more sophisticated features and concepts, and there were obvious biological examples of this working successfully.

        The artificial neural network methods of 1990, as you say hill-climbing on a multi-dimensional landscape, turned out not to work particularly successfully on deep networks, or more correctly, provide little additional benefit vs shallow networks. After this time, resarch in statistical learning move

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Glad they were able to make it work so quick, but drug discovery has been done like this for over a decade. I worked at an "Infomesa" startup that was doing this in Santa Fe in 2000.

    • by Black Parrot (19622) on Sunday November 25, 2012 @02:30AM (#42085439)

      I wonder how much of these improvements in accuracy are due to fundamental advances, vs. the capacity of available hardware to implement larger models and (especially?) the availability of vastly larger and better training sets...

      I'm sure all of that helped, but the key ingredient is training mechanisms. Traditionally networks with multiple layers did not train very well, because the standard training mechanism "backpropagates" an error estimate, and it gets very diffuse as at goes backwards. So most of the training happened in the last layer or two.

      This changed in 2006 with Hinton's invention of the Restricted Boltzman Machine, and someone else's insight that you can train one layer at a time using auto-associative methods.

      "Deep Learning" / "Deep Architectures" has been around since then, so this article doesn't seem like much news. (However, it may be that someone is just now getting the kind of results that they've been expecting for years. Haven't read up on it very much.)

      These methods may be giving ANN a third lease on life. Minsky & Papiert almost killed them off with their book on perceptrons in 1969[*], then Support Vector Machines nearly killed them again in the 1990s.

      They keep coming back from the grave, presumably because of their phenomenal computational power and function-approximation capabilities.[**]

      [*] FWIW, M&P's book shouldn't have done anything, since it was already known that networks of perceptrons don't have the limitations of a single perceptron.

      [**] Siegelmann and Sontag put out a couple of papers, in the 1990s I think, showing that (a) you can construct a Turing Machine with an ANN that uses rational numbers for the weights, and (b) using real numbers (real, not floating-point) would give a trans-Turing capability.

      • using real numbers (real, not floating-point) would give a trans-Turing capability.

        What on earth is trans-Turing capability?

        • using real numbers (real, not floating-point) would give a trans-Turing capability.

          What on earth is trans-Turing capability?

          Can compute things that a TM can't.

          I think the paper was controversial when it first came out, but I'm not aware that anyone has ever refuted their proof.

          • by HalfFlat (121672)

            [...] using real numbers (real, not floating-point) would give a trans-Turing capability.

            Given that almost every real number encodes an uncountable number of bits of information, I guess this isn't especially surprising in retrospect. The result though should make us suspicious of the assumption that the physical constants and properties in our physical theories can indeed take any real number value.

            • by maxwell demon (590494) on Sunday November 25, 2012 @07:58AM (#42086223) Journal

              Given that almost every real number encodes an uncountable number of bits of information, I guess this isn't especially surprising in retrospect. The result though should make us suspicious of the assumption that the physical constants and properties in our physical theories can indeed take any real number value.

              The number of bits needed to represent an arbitrary real number exactly is infinite, but not uncountable.

            • In reality or in the physical. It get quantum at some point. So even with zero noise any real parameter has finite bits for "perfect" representation. Then there is the noise issue. Real system don't match perfect math.
              • by Rockoon (1252108)
                You are confusing notation with representation. Just because we truncated all those zeros on the left and right of the number in our notation is irrelevant. The infinite number of 0's to the left and to the right are encoded, implicitly, in the notation that we use.
            • [...] using real numbers (real, not floating-point) would give a trans-Turing capability.

              Given that almost every real number encodes an uncountable number of bits of information, I guess this isn't especially surprising in retrospect. The result though should make us suspicious of the assumption that the physical constants and properties in our physical theories can indeed take any real number value.

              My intuition is that the difference between the TM's finite set of discrete symbols and the infinite/continuous nature of real numbers is exactly the reason.

              I'm not aware of any theory of continuous-state computing along the lines of the Chomsky hierarchy, but maybe there's one out there.

      • by snarkh (118018)

        > (b) using real numbers (real, not floating-point) would give a trans-Turing capability.

        Not sure what it means -- a Turing machine is not even capable of storing a single (arbitrary) real number.

        • He meant that an ANN with real numbers is a hypercomputer, which is true.

          The problem is that like most conceivable hypercomputers neural networks with real numbers would violate natural laws, e.g. the laws of thermodynamics.

          • How so? The math of thermodynamics uses real numbers and does not need any "tricks" to make it work.
            • How so? The math of thermodynamics uses real numbers and does not need any "tricks" to make it work.

              I think there is a theoretical minimal entropy production for any computation, so there's a limit to the amount of computation you could do if you used the entire observable universe.

              Of course, you can't have the infinite tape required by a TM either.

          • by snarkh (118018)

            Well, real numbers are inherently very problematic from the computational point of view.

    • I think this quote says it all:

      Referring to the rapid deep-learning advances made possible by greater computing power, and especially the rise of graphics processors, he added: “The point about this approach is that it scales beautifully. Basically you just need to keep making it bigger and faster, and it will get better. There’s no looking back now.”

      I'm sure they've come up with a few incremental advances, but it looks primarily like they've just taken advantage of hardware improvements. You can see from the numbers in the article the results are about what you'd expect from improved hardware (as opposed to actually solving the problem):

      [some guy] programmed a cluster of 16,000 computers to train itself to automatically recognize images in a library of 14 million pictures of 20,000 different objects. Although the accuracy rate was low — 15.8 percent — the system did 70 percent better than the most advanced previous one.

      • by timeOday (582209) on Sunday November 25, 2012 @03:26AM (#42085575)

        You can see from the numbers in the article the results are about what you'd expect from improved hardware (as opposed to actually solving the problem)

        "As opposed to actually solving the problem"? You brain has about 86 billion neurons and around 100 trillion synapses. It accounts for 2% of body weight and 20% of energy consumed. Do you think these numbers would be large if they didn't need do be?

        I think the emphasis in computer science on focusing so exclusively on polynomial-time algorithms has really stunted it. Maybe most of the essential tasks for staying alive and reproducing don't happen to have efficient solutions, but the constants of proportionality are small enough to brute-force with several trillion neurons.

        • Re: (Score:3, Insightful)

          by smallfries (601545)

          The problem comes when you try larger inputs. Regardless of constant factors if you are playing with O(2^n) algorithms then n will not increase above about 30. If you start looking at really weird stuff (optimal circuit design and layout) then the core algorithms are O(2^2^n) and then if you are really lucky n will reach 5. Back in the 80s it only went to 4, buts thats Moore's law for you.

          • by timeOday (582209)
            When you talk about O() you're talking about the worst case for finding an exact solution. Brains don't find exact solutions to anything.
        • I think there are actually a lot of AI researchers who are happy with approximate answers (the guy in the article was ecstatic getting 15% right), so it's probably a deeper problem than that.
        • by LeDopore (898286)

          The big difference is that biology isn't concerned with finding the optimal solution to problems; any very good solution (optimal or not) will let you live to see another day. A lot of math and computer science is dedicated to finding ironclad proofs that under every circumstance, a particular algorithm will deliver he optimal solution. While that's great when it's feasible, sometimes it's OK to go with something that works well even if it isn't optimal.

          The set of good heuristics is a strict superset of t

      • by ceoyoyo (59147)

        No, "deep learning" refers mostly to new training algorithms. More computer power helps of course, but the problem previously was that your training became less efficient the bigger your system got. If that doesn't happen, you can scale things up indefinitely.

    • They haven't done anything that wasn't already being done by others. They're just doing more of it. Essentially, the approach consist of using Bayesian statistics and a hierarchy of patterns. Prof. Hinton pretty much pioneered the use of Bayesian statistics in artificial intelligence. With a rare notable exception (e.g. Judea Pearl [cambridge.org]), the entire AI community has jumped on the Bayesian bandwagon, not unlike the way they jumped on the symbolic bandwagon in the latter half the 20th century, only to be proven wr

      • Sorry, but those blogposts aren't very convincing. Do you have *actual* arguments comparing Bayesian to these hypothetical alternatives, or should we just take the claims on trust?
        • Do you have *actual* arguments comparing Bayesian to these hypothetical alternatives

          The argument is simple. As Judea Pearl (an early proponent of Bayesian statistics for AI who has since changed his mind) explained, humans are not probability thinkers; they are cause/effect thinkers. If you drop a ball, you know it's going to hit the ground. You don't think that there is a probability that it might not. If you read the word Bayesian in this sentence, you know for certain that you did. There is nothing proba

          • Amazingly (I knew this poster "smelled" like rebel science), you aren't completely wrong here. We do create a model and we predict based on that model, but the basis of the prediction is pattern matching/detection from previous experience. A pattern match isn't guaranteed (hence the connection to probabilities), but it's the best guess based on experience.
      • by Rockoon (1252108)
        In the case of evolutionary optimization algorithms, they jumped onto the Bayesian bandwagon in 1999 but they jumped off it only one year later.. onto the much larger Shannon bandwagon.
    • It's the latter...one could assiduously identify common research buzzwords

      From a neuroscience perspective, it's about transmission of signals continuously in a highly complex network...a **hardware limit**

      The idea that there will be a 'fundamental advance' that allows for 'artificial intelligence' is really just hype.

      All we can ever make is better things to follow our instructions.

      • All we can ever make is better things to follow our instructions.

        What is the basis for that claim?

        In 50 years when we can simulate a brain to any arbitrary level of detail, or build a wet-brain one neuron at a time, why wouldn't it be able to do what naturally occurring intelligence can?

        Is there some Special Ingredient that cannot be simulated, even in principle? Or that cannot be understood well enough to try?

    • It's both (Score:5, Interesting)

      by Anonymous Coward on Sunday November 25, 2012 @03:19AM (#42085551)

      In the past few years, a few things happened almost simultaneously:

      1. New algorithms were invented for training of what previously was considered nearly impossible to train (biologically inspired recurrent neural networks, large, multilayer networks with tons of parameters, sigmoid belief networks, very large stacked restricted Boltzmann machines, etc).
      2. Unlike before, there's now a resurgence of _probabilistic_ neural nets and unsupervised, energy-based models. This means you can have a very large multilayer net (not unlike e.g. visual cortex) figure out the features it needs to use _all on its own_, and then apply discriminative learning on top of those features. This is how Google recognized cats in Youtube videos.
      3. Scientists have learned new ways to apply GPUs and large clusters of conventional computers. By "large" here I mean tens of thousands of cores, and week-long training cycles (during which some of the machines will die, without killing the training procedure).
      4. These new methods do not require as much data as the old, and have far greater expressive power. Unsurprisingly, they are also, as a rule, far more complex and computationally intensive, especially during training.

      As a result of this, HUGE gains were made in such "difficult" areas as object recognition in images, speech recognition, handwritten text (not just digits!) recognition, and in many more. And so far, there's no slowdown in sight. Some of these advances were made in the last month or two, BTW, so we're speaking about very recent events.

      That said, a lot of challenges remain. Even today's large nets don't have the expressive power of even a small fraction of the brain, and moreover, the training at "brain" scale would be prohibitively expensive, and it's not even clear if it would work in the end. That said, neural nets (and DBNs) are again an area of very active research right now, with some brilliant minds trying to find answers to the fundamental questions.

      If this momentum is maintained, and challenges are overcome, we could see machines getting A LOT smarter than they are today, surpassing human accuracy on a lot more of the tasks. They already do handwritten digit recognition and facial recognition better than humans.

    • Re: (Score:3, Interesting)

      by PhamNguyen (2695929)
      I work in this area. It is mainly the latter, that is bigger data sets and faster hardware. At first, people thought (based on fairly reasonable technical arguments) that deep networks could not be trained with backpropagation (which is the way gradient descent is implemented on neural networks). Now it turns out that with enough data, they can.

      On the other hand there have been some theoretical advances by Hinton and others where networks can be trained on unsupervised data (e.g. the Google cats thing)

    • I wonder how much of these improvements in accuracy are due to fundamental advances, vs. the capacity of available hardware to implement larger models and (especially?) the availability of vastly larger and better training sets...

      There are limits to what you can achieve with that. I was once surprised to discover how often I actually mishear words (when watching, e.g., episodes of US TV series) and no amount of repeating helps me. After thinking about it for a while, it became apparent to me that I actually interpolate based on the context. This, however, requires understanding what the particular speech is about. The same goes for reading badly printed or (more often) badly scanned text - quite often I reconstruct the word based on

      • by mikael (484)

        I used to do some transcription work to make a bit of spare cash. At the beginning of the tape, I really wouldn't understand the accent, not recognising some words, but after going through the tape once and replaying it, I would immediately recognise the words. It's almost as if there were a set of mask images for every word, and these didn't quite fit at first, but after 20-30 minutes they were scaled, rotated, and transformed in some way until they made a better match. Each word would also have a limited

      • Yes, we often interpolate from knowing what is being discussed. We can have algorithms to stand in to some extent but there is a limitation when the inference we make is from a representation of things out there in the world and knowledge about how those things work. We can sometimes get a sense of a conversation from very lossy understanding of what is being said.

      • by Rockoon (1252108)
        This BBC video on the McGurk Effect [youtube.com] will knock your socks off. What you 'see' effects what you 'hear.'
        • I still have my socks on. This has never worked on me. In noisy environments, I can study your mouth with a microscope and I will still have problems hearing you correctly. I had once someone on a noisy tram repeat to me the same sentence five or six times and then I gave up. That just happens to me every now and then, visual cues or not. Well, everyone's brain is different, I guess.
    • by slashmydots (2189826) on Sunday November 25, 2012 @01:56AM (#42085381)
      Humans can't even solve those, lol.
      • by swillden (191260)
        That's the point. blue trane was hoping for an automated captcha-solving assistant so he wouldn't be frustrated by them.
    • by ceoyoyo (59147)

      There have been several stories about captchas being broken, to the point where secure ones today have to be barely decipherable by humans. That suggests the character recognition algorithms are performing very similarly to humans.

      • I've really wondered about that though, I've seen the stories, but I've never seen the evidence. Were they really broken? Or was it just a claim that was never verified?
        • by ceoyoyo (59147)

          Here's one you can try out yourself: http://code.google.com/p/captchacker/ [google.com]

          The captcha's now are harder than they used to be but I have no doubt that if you run a few hundred through a breaker you'd get a few hits. Not quite human level, but impressively close considering where we were five years ago. Someone with some serious computer power to put behind it could probably do significantly better.

          AI got a bad name because of the promises it made in the 60s and 80s, and there are lots of mystics who are cri

  • Can You Imagine a Beowulf Cluster of These?

  • by Anonymous Coward on Sunday November 25, 2012 @02:32AM (#42085443)

    I'm doing Prof Hinton course on Neural Network on Coursera this semester. It covers the old school stuff plus the latest and greatest. From what I gather from the lecture, training neural networks using lots of layers hasn't been practical in the past and was plauged with numerical and computational difficulties. Nowadays, we have better algorithms and much faster hardware. As a result we now have the ability to use more complex networks for modelling data. However, they need a lot of computational power thrown at them to learn compared to other machine learning algorithms (random forest). The lecture quotes training taking days on a Nvidia GTX 295 GPU to learn the MNIST handwritten dataset. Despite this, the big names are already using this technology for applications like speech recognition (Microsoft, Siri), object recognition (Google Cat video, okay that's not a real application yet).

  • Old News (Score:2, Interesting)

    by Dr_Ish (639005)
    While there have been advances since the 1980s, as best I can tell most of this report is yet more A.I. vaporware. It is easy to put out a press release. It is much harder to do the science to back it up. How did this even get posted on the/. front page? If this stuff was true, I'd be happy, as most of my career has been working with so-called 'neural nets'. However, they are not neural, that is just a terminological ploy to get grants (anyone ever heard of the credit assignment problem with bp?) Also, the
  • ... wake up people..... its the fucking drug industry looking for any excuse it can to sell you aanother one of their drugs...

    And pot remains, for the most part, illegal.....

    I think we already have achieved artificial intelligence... in humans...

  • While neural networks do amazingly well for a certain type of problems, they do have their limitations. Neural networks are good for designing reflex machines, that react to their current environment. They aren't efficient when they have to learn on the field or plan ahead.

  • From a data set describing the chemical structure of 15 different molecules, they used deep-learning software to determine which molecule was most likely to be an effective drug agent."

    So the AI is going to turn some molecules into an FBI undercover snitch? That's some serious DNA-FU there!

  • by Fnord666 (889225) on Sunday November 25, 2012 @11:38AM (#42087057) Journal
    Here [youtube.com] is a good video of a talk given by Dr. Hinton about Restricted Boltzman Machines. It is a very promising technique for deep learning strategies.

Nothing succeeds like success. -- Alexandre Dumas

Working...