DeepMind's Game-Playing AI Has Beaten a 50-Year-Old Record In Computer Science (technologyreview.com) 91

Posted by BeauHD on Wednesday October 05, 2022 @10:30PM from the matrix-multiplication dept.

An anonymous reader quotes a report from MIT Technology Review: DeepMind has used its board-game playing AI AlphaZero to discover a faster way to solve a fundamental math problem in computer science, beating a record that has stood for more than 50 years. A year after it took biologists by surprise, AlphaFold has changed how researchers work and set DeepMind on a new course. The problem, matrix multiplication, is a crucial type of calculation at the heart of many different applications, from displaying images on a screen to simulating complex physics. It is also fundamental to machine learning itself. Speeding up this calculation could have a big impact on thousands of everyday computer tasks, cutting costs and saving energy.

Despite the calculation's ubiquity, it is still not well understood. A matrix is simply a grid of numbers, representing anything you want. Multiplying two matrices together typically involves multiplying the rows of one with the columns of the other. The basic technique for solving the problem is taught in high school. But things get complicated when you try to find a faster method. This is because there are more ways to multiply two matrices together than there are atoms in the universe (10 to the power of 33, for some of the cases the researchers looked at).

The trick was to turn the problem into a kind of three-dimensional board game, called TensorGame. The board represents the multiplication problem to be solved, and each move represents the next step in solving that problem. The series of moves made in a game therefore represents an algorithm. The researchers trained a new version of AlphaZero, called AlphaTensor, to play this game. Instead of learning the best series of moves to make in Go or chess, AlphaTensor learned the best series of steps to make when multiplying matrices. It was rewarded for winning the game in as few moves as possible. [...] The researchers describe their work in a paper published in Nature today. The headline result is that AlphaTensor discovered a way to multiply together two four-by-four matrices that is faster than a method devised in 1969 by the German mathematician Volker Strassen, which nobody had been able to improve on since. The basic high school method takes 64 steps; Strassen's takes 49 steps. AlphaTensor found a way to do it in 47 steps. "Overall, AlphaTensor beat the best existing algorithms for more than 70 different sizes of matrix," concludes the report. "It reduced the number of steps needed to multiply two nine-by-nine matrices from 511 to 498, and the number required for multiplying two 11-by-11 matrices from 919 to 896. In many other cases, AlphaTensor rediscovered the best existing algorithm."

DeepMind's Game-Playing AI Has Beaten a 50-Year-Old Record In Computer Science

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 91 Comments Log In/Create an Account

Comments Filter:

- Punching Nazis is Legal (Score:1)
  
  by Anonymous Coward writes:
  
  You can always tell the tough guy Nazi who hasn't been punched in the mouth. Follow in the great American patriotic tradition of WWII and punch a Nazi in the mouth.
- Re: (Score:2)
  
  by UnresolvedExternal ( 665288 ) writes:
  
  Sigh ... all the good trolls have left ... never thought I would miss APK
Meaningless (Score:1)

by gweihir ( 88907 ) writes:

It is a constant very small speed-up (about 4%) for a very specific size and it really depends on a lot of details whether it is faster at all.
This is just the Artificial Ignorance people trying to deny what everybody already knows: Their stuff is not nearly as great as they like to claim.
- Re:Meaningless (Score:5, Insightful)
  
  by The Evil Atheist ( 2484676 ) writes: on Wednesday October 05, 2022 @11:01PM (#62942823)
  
  4% is huge if you're talking about supercomputers doing simulations. Imagine the power budget that 4% can save.
  
  Facebook can save millions of dollars for every (less than) 1% speedup they save.
  
  The bigger point is that AI can discover these techniques now for arbitrary problems, instead of having to rely on possibly decades of mathematicians working on a problem. If they can solve these problems, then there's research potential to solve even larger problems of major import today.
  
  eg, DeepMind is also working on AI to help control fusion. A "4%", or smaller even, speed up is important if you want to react to physical conditions in realtime. Every little percentage helps.
  
  You just have to face the fact that we're no longer in a world where mathematics or physics progresses by huge, but rare, leaps. We're in dire need of these incremental improvements now.
  
  - Re: (Score:2)
    
    by timeOday ( 582209 ) writes:
    
    I think the speedup will be 0%, not 4%, because this would be done in parallel, not using a sequential algorithm with the smallest number of steps. Remember when MMX was cool? (1997) Here's a homework problem to parallelize 4x4 matrix multiplication:
    https://www.codeproject.com/Qu... [codeproject.com]
    I just referenced MMX because it's so old. 3d graphics does zillions of 4x4 matrix multiplies (composing transformations) so they are done by GPUs in parallel, likely as a special case.
    https://gamedev.stackexchange.... [stackexchange.com]
    - Re: (Score:3)
      
      by The Evil Atheist ( 2484676 ) writes:
      
      You realize they can apply this optimization to NEW HARDWARE designs, right?
      
      All the big names are designing their own chips that specialize in this stuff.
      - Re: (Score:2)
        
        by timeOday ( 582209 ) writes:
        
        What I'm saying is the 49 to 47 steps reduction is irrelevant because it's already done in 5 vector operations instead. Or something like that.
        
        Re: (Score:2)
        
        by The Evil Atheist ( 2484676 ) writes:
        
        It's not irrelevant, because vector operations aren't as efficient as you think they are.
        
        Just ask Linus.
        
        Re: (Score:2)
        
        by Locke2005 ( 849178 ) writes:
        
        In fact, in many cases you're bound by I/O speed, not CPU speed, and locality of data (being able to the operands out of cache) is more important than the actual number of operations being performed.
        
        Re: (Score:2)
        
        by Locke2005 ( 849178 ) writes:
        
        In most HPC problem sets, the size data is far larger than the cache. I'm just saying that an algorithm that takes maximum advantage of locality might be faster in practice than one that involves few steps. Of course, the thing to do is select for an algorithm that minimizes run time on the hardware you're actually using, rather than an algorithm that minimizes steps.
      - Re:Meaningless (Score:4, Insightful)
        
        by CaptQuark ( 2706165 ) writes: on Thursday October 06, 2022 @12:50AM (#62942941)
        
        AlphaTensor can also optimize matrix multiplication for specific hardware. The team trained the agent on two different processors, reinforcing it not only when it took fewer actions but also when it reduced runtime. In many cases, the AI sped up matrix multiplications by several per cent compared with previous algorithms. And sometimes the fastest algorithms on one processor were not the fastest on the other.
        This type of information can also help compilers speed up code targeted to specific processors. A 4% speed improvement on a code block that will be run millions of times such as graphics programs that use FFT across an entire image can be significant.
        A 4% improvement on a process that takes 24 hours saves 58 minutes. I know that this single matrix-multiplication improvement won't apply evenly to the entire process, but 4% here and 4% there and you start to get noticeable differences.
        
        
        Re: (Score:2)
        
        by ConceptJunkie ( 24823 ) writes:
        
        You're absolutely right. Improvements in computer science and software performance in general are made up of a bunch of these kinds of improvements. Every bit counts.
    - Re: (Score:2)
      
      by Locke2005 ( 849178 ) writes:
      
      Some problem sets can not be solved in parallel on sequential computers. But maybe quantum computers would make these speed ups irrelevant.
  - Re:Meaningless (Score:5, Interesting)
    
    by LostMyBeaver ( 1226054 ) writes: on Thursday October 06, 2022 @12:05AM (#62942911)
    
    The solution provides would save approximately 4% of computation clock cycles but not memory read or store.
    
    Also, due to SIMD instructions, the saves aren't actually 4% because scatter gather takes space and SIMD operations don't like working in columns of registers only rows (or other way around depending on your perspective).
    
    A SIMD multiple poorly written uses 1/4 the clock cycles of the operation they're competing with. A well written SIMD multiple, especially one employing CPU multiply and accumulate instructions which can multiple and accumulate up to 16 parallel registers in a single operation... that's ... up to 16 times faster than the benchmark they're using.
    
    They solution they came up with actually would not be parallelizable. Though maybe they could use the same logic to optimize a proper matrix multiply operation instead. Matrix transpositions are stinking expensive and I've never been convinced it's entirely necessary to keep everything aligned.
    
    The vast majority of the time in HPC is in load and store operations. We use almost no CPU cycles on calculation itself. The vast majority of CPU calculations on HPC are due to poorly written code from scientists who were more interested in the science than the code. Almost every piece of HPC code I've worked on I've been able to optimize to run faster on a laptop GPU than the old code on a hundred thousand cores. Most of my optimizations are simply aligning memory operations, recalculating rather than loading, keeping the CPU pipeline full, reducing cache line misses when possible. Another huge one is coding for the bus topology of the CPU.
    
    The problem is, when I'm done with the code, you can run it on an Apple watch, but the scientist has no ability to recognize the science anymore. So, instead of a laptop, we waste hundreds of millions of dollars on super computers because the science is more valuable than the money,
    
    - Re: (Score:2)
      
      by The Evil Atheist ( 2484676 ) writes:
      
      If you reduce the number of operations, you reduce the number of loads and stores...
      
      Fewer operations means fewer incidental bits of data you have to keep around.
      
      As I said to another commenter, this kind of improvement probably makes the most sense in the custom hardware these giants are creating for such purposes.
      - Re: (Score:2)
        
        by vyvepe ( 809573 ) writes:
        
        If you reduce the number of operations, you reduce the number of loads and stores...
        It is not only about the number of loads/stores. It is also about their ordering so that the loads stores are cache & prefetch friendly. If you reduce the number of operations by 4% and because of that you get 1% cache misses more then you are most likely worse overall. One cache miss costs you somewhere around hundred of ALU operations.
    - Data Cache law #1 (Score:2)
      
      by mveloso ( 325617 ) writes:
      
      I learned data cache management almost 20 years ago, so here is the first law of cache: the cache is a linear object. You load in the direction of the cache, not across the cache. The former allows the cache to prefetch perfectly. The latter leads to continuous loads and blown cache, which fucks stuff up tremendously.
      I mean jesus, I think this was in Byte.
      Maybe the "premature optimization" bullshit has gone too far?
    - Re:Meaningless (Score:4, Insightful)
      
      by laughing_badger ( 628416 ) writes: on Thursday October 06, 2022 @03:52AM (#62943109) Homepage
      
      The only purpose of the code is to do some science though, and evolving the code is part of that. It's typically a lot easier and cheaper to throw CPUs and disk at a problem than it is to train the science team in code optimisation. Especially as the code may only ever be run once in a given configuration. The happy middle ground is to optimise the code just enough that they can still work on it.
      "This is, by far, the worst code I've ever seen running!" "But it _is_ running."
      
    - Optimizing costs too (Score:2)
      
      by vyvepe ( 809573 ) writes:
      
      Well, properly optimizing code takes time. That time costs money. It may be well possible that the supercomputer is cheap compared to the time of all the coders optimizing the code to run on "Apple watch".
      - An Apple watch (Score:2)
        
        by Latent Heat ( 558884 ) writes:
        
        has the computing power of a super computer from a small number of decades ago?
        An Apple watch, meh, I would like to see the problem optimized to run on a 1970's era Altair micro computer. Forget that, try running it on a Babbage Difference Engine.
        That all goes to show that saving a small percentage on one important but still narrow part of a numerical code gets swamped by Moore's Law.
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      They solution they came up with actually would not be parallelizable. Though maybe they could use the same logic to optimize a proper matrix multiply operation instead. Matrix transpositions are stinking expensive and I've never been convinced it's entirely necessary to keep everything aligned.
      Bingo.
      
      These days, HPC GPUs (Volta+) all have wicked fast matrix multiplication blocks in them.
      The point of this is to improve those blocks, not improve software that will never ever run efficiently on a SIMD vector engine.
    - Re: (Score:3)
      
      by dargaud ( 518470 ) writes:
      
      I feel your pain because I've done the same (month-long optimization) job only to be told: "Oh, but that means I would need to organize my data differently. So no, we'll keep doing it like before"...
    - Re: (Score:1)
      
      by LionKimbro ( 200000 ) writes:
      
      Off-topic question.
      How do you optimize things these days? I've been programming for almost 40 years now. I cut my teeth on an 8088 Compaq computer. I started in BASIC, went to Pascal, then to Assembly, and then to C. I remember studying Mode X graphics for VGA monitors, writing TSR programs in DOS, and writing assembly code to optimize loops.
      I can't tell for the life of me what's going on inside these processors, and more to the point, I don't know *how* to tell what's going on inside these processors.
      I
  - Not Meaningful (Score:2)
    
    by Roger W Moore ( 538166 ) writes:
    
    4% is huge if you're talking about supercomputers doing simulations. Imagine the power budget that 4% can save.
    4% is not huge because all it would mean is that you'll get 4% more simulation - nobody I know shuts down their facilities when they do not have jobs and it's pretty rare not to have jobs. Also, it is not 4% in total, it is only 4% when multiplying two matrices where one of them is asymmetric.
    
    It definitely is interesting that they have managed to use AI to find a more efficient algorithm even if it is one that is not a major use of CPU. Hopefully, this is just a start and they can use this technique to
- Re: (Score:2)
  
  by willy_me ( 212994 ) writes:
  
  It actually is great because it represents a new way of generating algorithms. The gains from matrix multiplication are nice but not that significant - as you mentioned. However, the technique could also be applied to other problems which is where it gets interesting.
  - Re: (Score:1)
    
    by gweihir ( 88907 ) writes:
    
    Not really. This approach only works for a specific input size.
    - Re: (Score:3)
      
      by jvkjvk ( 102057 ) writes:
      
      >Not really. This approach only works for a specific input size.
      Yes, "Overall, AlphaTensor beat the best existing algorithms for more than 70 different sizes of matrix"
      So "Only" 70 different sized matrices so far.
      And the overall approach of using AI in this manner, as a proper tool of discovery? Priceless.
- Re: (Score:1)
  
  by Arethan ( 223197 ) writes:
  
  Cmon now... The great Elon has promised me flying car blowjobs built and serviced by this very same Artificial Ignorance. It is clearly all that is left that could save humanity. How can you possibly detract from that. How dare you. /end-greta-scene /s
  Actually though, yea. "AI" is a shitty moniker for a complex pre-weighted statistical analysis system, regardless of how many layers of math equations are stacked up to make it seem "smart". It does some tasks that are normally hard to define really well, like
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  I've noticed that you have a few triggers that really set you off.
  Any good news from Intel, and any good news from ML research.
  You immediately get defensive as if you yourself were insulted. It really is bizarre.
  
  This method provides a 10-20% speedup over the methodology currently used in today's GPU matrix multiplier blocks (NV Volta+, various TPUs)
  You let your triggering blind you.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    I've noticed that you have a few triggers that really set you off.
    Any good news from Intel, and any good news from ML research.
    Because I can see it is not actually good news but fake news. In particular ML, I have been following for around 35 years now, and it has consistently been grand promises and very little delivery. An engineering PhD in the CS field also helps to see what is actually there. I am just tired of the crap these people claim time and again. And for Intel, they have been screwing their customers over again and again and continue to do so.
    This method provides a 10-20% speedup over the methodology currently used in today's GPU matrix multiplier blocks (NV Volta+, various TPUs)
    You let your triggering blind you.
    The numbers in the article indicate 2-4% based on the "steps". The 10-20% are
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      Because I can see it is not actually good news but fake news. In particular ML, I have been following for around 35 years now, and it has consistently been grand promises and very little delivery. An engineering PhD in the CS field also helps to see what is actually there. I am just tired of the crap these people claim time and again. And for Intel, they have been screwing their customers over again and again and continue to do so.
      Oh I'll grant you there have been nutballs who have made grand claims about ML. But to say it has had little delivery is flat out denying a rationally undeniable truth.
      The numbers in the article indicate 2-4% based on the "steps". The 10-20% are bogus. They just result from optimization specifically for the hardware used. A manual optimization by a competent human or an optimizing compiler that targets specific hardware not just the general architecture could have done something there as well, but nobody found it worthwhile to invest the time.
      The numbers in the article do not. Your interpretation of what "steps" mean does.
      You should read the article again, but this time for real, since it's pretty clear what you really did was glance over it looking for something you could bitch about on slashdot because you were highly triggered by it.
      
      In case that proves too burdensome, I'll c
- Re: (Score:2)
  
  by jvkjvk ( 102057 ) writes:
  
  >It is a constant very small speed-up (about 4%) for a very specific size and it really depends on a lot of details whether it is faster at all.
  Do you have any idea how many matrix multiplications are made by supercomputer AI's? I'll give you a hint - most of their computations are matrix multiplications. If you read the summary (I know, wild idea) they found speed ups for a number of different sized matrices. (That's the "lots of details whether it is faster at all", they found different number of ste
- Re: (Score:2)
  
  by The Evil Atheist ( 2484676 ) writes:
  
  Yes. Almost every actual professional maths software uses all the algorithmic speedup tricks there are.
  
  "Entry-level" textbooks on finite element stuff discuss plenty of special techniques for common patterns that crop up.
- Re: (Score:2)
  
  by necro81 ( 917438 ) writes:
  
  I recall in a computer algorithms course we discussed matrix multiplication. We concluded pretty quickly that it was O(n^3), where n is the size of the (2D) matrix. So, for instance, a 4x4 matrix multiple takes 4^3 = 64 operations.
  
  Then the professor told us Strassen's Algorithm [wikipedia.org], which gets that down to O(n^2.86), and my head just about exploded. (2.86 = log2(7), if you were wondering where that number comes from.) Even that has been improved upon over the years [wikipedia.org], although often requiring more specializ
50 years to optimize 4x4 matrix multiplication? (Score:1)

by kmoser ( 1469707 ) writes:

The headline result is that AlphaTensor discovered a way to multiply together two four-by-four matrices that is faster than a method devised in 1969 by the German mathematician Volker Strassen, which nobody had been able to improve on since. The basic high school method takes 64 steps; Strassen's takes 49 steps. AlphaTensor found a way to do it in 47 steps.
Seriously? It took 50 years and an AI to figure out a faster way to multiply two 4x4 matrices? For matrices that small, how many variations could there possibly be?
- Re: (Score:2)
  
  by The Evil Atheist ( 2484676 ) writes:
  
  Okay, so why didn't YOU discover this new method first, if it's so simple?
  - Re: (Score:2)
    
    by znrt ( 2424692 ) writes:
    
    give him some time. he doesn't even understand the problem yet, and for that he first needs to work on his reading skills to decrypt the abstract, since it's codified in written english.
  - Re: (Score:2)
    
    by kmoser ( 1469707 ) writes:
    
    I haven't been looking, nor is it my field of research. I'm just surprised that for a 4x4 matrix, conventional algorithms couldn't find a faster algorithm and we had to rely on AI.
    
    The abstract claims "there are more ways to multiply two matrices together than there are atoms in the universe (10 to the power of 33, for some of the cases the researchers looked at)" but that's obviously worst case scenario. How many ways are there for a 4x4 matrix? The abstract doesn't say.
    - Re: (Score:2)
      
      by The Evil Atheist ( 2484676 ) writes:
      
      conventional algorithms couldn't find a faster algorithm and we had to rely on AI.
      Those conventional algorithms were human mathematicians, and they have limits.
      
      AI is just really good at looking for patterns in large amounts of data that humans cannot, so who knows what other "simple" mathematics can be optimized in the future simply because we can generate billions of examples for AI to learn patterns from?
      - Re: (Score:2)
        
        by kmoser ( 1469707 ) writes:
        
        Those conventional algorithms were human mathematicians, and they have limits.
        
        Everything has limits, even AI, so that's a meaningless statement. Those mathematicians have had computers with conventional (non-AI) algorithms at their disposal for over 50 years, and in recent years those computers have included ultra fast hardware that can test countless permutations in very little time. Despite that, no conventional algorithms seem to have found a solution.
        
        Re: (Score:1)
        
        by suarezvictor77 ( 9507745 ) writes:
        
        any math expert here that can count the number of combinations for the 4x4 case?
        
        Re: 50 years to optimize 4x4 matrix multiplication (Score:3)
        
        by Jeremi ( 14640 ) writes:
        
        Iâ(TM)m no math expert, but I do have some expertise in reading the linked article, which includes the following:
        âoeItâ(TM)s mind-boggling to see that there are at least 14,000 ways of multiplying four-by-four matrices,â says Hussein Fawzi, a research scientist at DeepMind.
        
        Re: 50 years to optimize 4x4 matrix multiplicatio (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        You don't need machine learning to find the best algorithm out of 14k. You can just run an exhaustive search.
        I'm with GP. If this is actually a relevant boost, why has no one found it before now?
        Seems likely that this algorithm actually doesn't help anyone accomplish anything any better than they could have before.
        
        Re: (Score:1)
        
        by iggymanz ( 596061 ) writes:
        
        The AT LEAST 14K ways aren't all known, that';s why "exhaustive search" can't be done. Amazing the number of people here who learned one simple way of multiplying matrices and all of a sudden they are experts. Of course what they're really doing is talking out of their ass in total ignorance. Maybe PhDs in math and computer science know a bit more than you all?
        
        Re: (Score:2)
        
        by The Evil Atheist ( 2484676 ) writes:
        
        So what are you even complaining about then?
        
        AI discovered something conventional algorithms (mathematicians) didn't. Fact. It happened.
    - Re: (Score:2)
      
      by paradigm82 ( 959074 ) writes:
      
      There's essentially an infinite number if you count even the slower ways. I'm not an expert on matrix mul or optimizations so here's my native take: A 4x4 matrix has 16 entries. So you have two matrices each with 16 entries. In every operation there's a lot of choices: You can mul/add/subtract/divide both entries across matrices or within a matrix. That gives an intermediary result which you can then use when making the choice of the next calculation etc. Essentially you start out with 32 numbers on a list.
      - Re: (Score:2)
        
        by pjt33 ( 739471 ) writes:
        
        You're never going to want to divide. Also, since addition and multiplication are commutative you don't really have 32 x 32 choices: allowing for selecting the same number twice there are 33 x 32 / 2 distinct choices. Also, it's never useful to subtract a number from itself. So the first step is chosen from 2048 options rather than 4096. Then you can impose an order on the (operation, entry, entry) tuples and break symmetry, massively reducing the number of sequences of operations to consider. It's still to
- Re: (Score:1)
  
  by Rod.Dorman ( 458034 ) writes:
  
  Seriously? It took 50 years and an AI to figure out a faster way to multiply two 4x4 matrices? For matrices that small, how many variations could there possibly be?
  According to the article "there are at least 14,000 ways of multiplying four-by-four matrices"
- Re: (Score:2)
  
  by dvice ( 6309704 ) writes:
  
  This video is about normal multiplication, not about matrices, which is a harder problem. But I think that if you watch it, you have much better understanding about how algorithms in computers are used and you most likely understand why it is so hard to invent new methods:
  This link takes you straight to the action, but feel free to watch from the start if you have time as it might be easier to understand:
  https://www.youtube.com/watch?... [youtube.com]
Can it play Doom? (Score:2)

by thesjaakspoiler ( 4782965 ) writes:

Ow... wait :
https://www.cmu.edu/news/stori... [cmu.edu]
- no let's play global thermonuclear war (Score:2)
  
  by Joe_Dragon ( 2206452 ) writes:
  
  What side do you want
  1. USSR
  2. USA
  3. North Korea
- Re: Can it play Doom? (Score:2)
  
  by PinchDuck ( 199974 ) writes:
  
  Can it play doom on a tractor?
what kind of reward do they give an AI? (Score:1)

by bonedonut ( 4687707 ) writes:

a day at a virtual beach?
Reward & Punishment inspire Intelligence - (Score:2)

by swell ( 195815 ) writes:

- and ultimately, misdeeds. All intelligence works this way. The plant is rewarded when it can make its leaves face the sun. The rabbit is punished when it can't evade the hunter. The human learns social graces or suffers social punishments. But what reward was given to this AI so that it was inspired to find an improved algorithm?
If an AI can be motivated by perceived rewards or by threats, then it can be as misguided as we irrational beings. And thus dangerous to themselves and us.
There is a science fict
- - Re: (Score:2)
    
    by vyvepe ( 809573 ) writes:
    
    I bet he is talking about dangers of too simplistic fitness function for a general AI not about a dumb computer programs.
    Though I grant you that it was formulated like for a dumb general audience :)
- Re: (Score:2)
  
  by Junta ( 36770 ) writes:
  
  It was not 'rewarded', it simply was given the parameter to do a logical compare and select the better scare.
  I hate how these articles tend to anthropomorphize software that isn't given anything vaguely resembling the characteristics they ascribe.
Shitty clickbait title, come on! (Score:2)

by fleeped ( 1945926 ) writes:

My android news feed, usually rife with shit clickbait articles, picked up this one from Nature:
"Artificial intelligence finds faster algorithms for multiplying matrices"
Notice the difference to Slashdot's turd of a title?...
trades for addition and subtraction (Score:5, Interesting)

by Orgasmatron ( 8103 ) writes: on Thursday October 06, 2022 @01:17AM (#62942969)

For anyone who was wondering how this is possible, there are a bunch of schemes for multiplying matrices (and complex numbers, and presumably quaternions and such) that all involve trading out expensive* multiplication operations for cheap* addition and subtraction operations. (* Yeah, I know, hold your horses, I'll get there.)
Basically, it finds points where multiple cells in the result can be represented by sums or differences of intermediate products. A different example (and there are several others) is Ungar's algorithm for multiplying complex numbers. His system uses 3 multiplications, 3 additions and 2 subtractions instead of 4 multiplications, 1 addition and 1 subtraction.
The articles don't mention the trade off. They don't mention addition or subtraction at all actually, which gives the incorrect impression that the AI has managed to banish a few percent of the work into a black hole. What really happened is that the AI turned some of the work-that-was-hard-in-the-90s into work-that-was-easy-in-the-90s. (These are very nearly equal cost operations in modern hardware.)
Personally, I'd be much more excited if the AI could figure out why some multiplications can be traded for additions and subtractions, or at least gave us some heuristics for evaluating candidates without doing all the work. That would revolutionize a bunch of related fields in ways that this, yawn, Yet Another Cost Optimization Search never will. But AI is hot right now, and this story was about a minor achievement in one of those fields that no one but journalists considers to be AI, so here we are.

- Please, have a heart! (Score:3)
  
  by Gravis Zero ( 934156 ) writes:
  
  Sir, I have read your post. May I please release my horses? I've been holding them for hours and they are getting restless. :(
- Re: (Score:2)
  
  by mesterha ( 110796 ) writes:
  
  The articles don't mention the trade off. They don't mention addition or subtraction at all actually, which gives the incorrect impression that the AI has managed to banish a few percent of the work into a black hole.
  
  What you write is a bit misleading. Strassen style algorithms work by dividing up the matrices into blocks and doing matrix multiplies and matrix additions with the blocks. Given that matrix addition (n^2) is so cheap compared to naive matrix multiply (n^3), this gives a real savings in com
Deceptive. (Score:1)

by X_DARK_X ( 1881648 ) writes:

The majority of application for this is in graphics. Which uses 3x3 matrix which can be represented as a quaternion and solved in 4 operations.
- Re: (Score:2)
  
  by Gibgezr ( 2025238 ) writes:
  
  That works for rotations. It falls apart when you want to do translations and scaling as well. 3D computer graphics uses much more 4x4 transformation matrices than quaternions or 3x3 matrices. Quaternions are used primarily for storing rotations for bone systems for hierarchical animation, and camera rotations. The bone systems typically only need to store rotations of bones relative to their parent bone, so the 1x4 quaternion is sort-of efficient (although manipulating quaternions involves more complex ope
Probably not actually useful (Score:5, Insightful)

by tap ( 18562 ) writes: on Thursday October 06, 2022 @01:45AM (#62942999) Homepage

It's a bit misleading to say it beat a 50 year old record. Strassen's method from 1969 has been bested 11 times according to Wikipedia (Sorry, I didn't search for every paper on fast matrix multiplication myself) since then, the latest in 2020.
Strassen can multiply two 2x2 matrices with just 7 multiplications instead of the 8 it takes using the straightforward method taught in high school. But it's not used to multiply 2x2 matrices in practice, because saving that one multiply costs extra additions. And it processes the data in a slower, out of order way. And there are a bunch of extra temporary values to keep track of. So it's slower. Even without vector instructions it's slower.
But it can work in a divide and conquer way too. Think of a 4x4 matrix as 2x2 matrix made of four 2x2 matrices. We can multiply it the normal way by doing 8 multiplies, but each multiply is of 2x2 matrices this time (for a total of 8*8 = 64 scalar multiplies). Strassen's method lets us save one multiply, but this time it's not just one scalar multiply operation, it's a whole 2x2 matrix multiply that gets to be saved.
It's still slower! But eventually with big enough matrices, it gets to be faster. Like 512x512 big at least.
So they found something better than Strassen for small matrices. But it's not actually useful for those sizes. And it's unlikely their method will be either. Now if they had something for large matrices, it might actually be faster than what's used now.

- Please mod up (Score:2)
  
  by vyvepe ( 809573 ) writes:
  
  Please mod up
- Re: (Score:2)
  
  by jvkjvk ( 102057 ) writes:
  
  >So they found something better than Strassen for small matrices. But it's not actually useful for those sizes. And it's unlikely their method will be either. Now if they had something for large matrices, it might actually be faster than what's used now.
  "Overall, AlphaTensor beat the best existing algorithms for more than 70 different sizes of matrix,"
  So, it wasn't just for small matrices.
Genetic algorithms? (Score:2)

by Viol8 ( 599362 ) writes:

Isn't this sort of guided trial and error approach what genetic algorithms are supposed to be good at and were used a lot for until ANNs suddenly became flavour of the month? Seems they've rather fallen by the wayside in the last decade or so but I wonder if they could do just as well if someone used them in the same problem spaces.
- Re: Genetic algorithms? (Score:2)
  
  by Dantu ( 840928 ) writes:
  
  It is. However I did an AI course recently as a refresher. In practice genetic algorithms just don't work that well. This is too bad, I thought they were really cool and spent a lot of time trying to understand and Implement toy versions when I was younger. The overall population and mutation is basically a flavour of parallel gradient descent. The crossover, which is what makes them unique, is very hard to do in a way that is beneficial for most problems. Consider that if you could slam the first half of o
  - Re: (Score:2)
    
    by Viol8 ( 599362 ) writes:
    
    Quite possibly, but decomposing the problem in the first place is the hard part otherwise we wouldn't need any kind of automated search system for problem domains.
  - - Re: (Score:2)
      
      by Dantu ( 840928 ) writes:
      
      The No Free Lunch theorem (NFL) only applies if the target function is chosen from a uniform distribution of all possible functions.
      However, the set of target functions that humanity might want to optimize are not chosen from a uniform distribution of all possible functions. It's easy to demonstrate that assuming this precondition leads to absurd outcomes. In particular, assuming NFL applies to the distribution of real problems humans care about lead to the conclusion that Occam's razor is false and
Number of atoms (Score:3)

by TonyJohn ( 69266 ) writes: on Thursday October 06, 2022 @05:15AM (#62943181) Homepage

Is it just me, or is 10^33 not even close to the number of atoms in the atmosphere, let alone the universe? A typo, surely.

- Re: (Score:2)
  
  by Ormy ( 1430821 ) writes:
  
  A quick google search of "how many moles of gas in earth's atmosphere" gives a new scientist article as the first result which states ~10^44 atoms in Earth's atmosphere, which sounds correct to me. And I seem to remember reading years ago that there are around 10^70 protons (not atoms) in the visible universe. So yeah 10^33 atoms in the universe is obviously way wrong.
- Re: (Score:2)
  
  by Dr. Tom ( 23206 ) writes:
  
  Came here for this. Do people enjoy just glossing over glaring inaccuracies? Not questioning random claims? What else in the article is just a made up number?
It's been done better before (Score:4, Informative)

by MarkWegman ( 2553338 ) writes: on Thursday October 06, 2022 @06:59AM (#62943307)

Alpha-fold did better than the original Strassen algorithm. But Coppersmith-Winograd(CW) were able to improve matrix multiply by more than Strassen did using some very advanced techniques.
Stassen was the first that found that you could make progress in terms of faster matrix multiply. This inspired other humans explored whether you could push his techniques further. I think there were tile based algorithms that beat Strassen's original as alphafold has done, but don't remember because then people switched to the CW based ideas, which are not simple tiles. I don't remember in what ways CW varied from Strassen, it's a hard read. Others have bettered even the CW algorithm. Alphafold restricted itself, I guess to the Strassen like algorithms. No human would bother to restrict themselves that way after CW, because the end result is not as good as CW (and improvements on CW). So implying that alphafold has beaten humans is pretty dubious.
Most matrix multiplies are of real values. All of these faster algorithms suffer from some forms of numeric stability, and so are of no use for most matrix multiply. They can be used in the rare case where the matrices are over finite fields.

AI Backed Search is a good future (Score:1)

by Strong_Duality ( 10186333 ) writes:

A world where AI can rediscover or learn new search strategies/heuristics will be a game changer. Imagine something like this designed for Mixed Integer Programming (MIP). Instead of hand-crafting strategies to search, AI can make much better decisions. Think of the potential savings $.
From your friendly misunderstanding department (Score:2)

by ZiggyZiggyZig ( 5490070 ) writes:

I didn't know there was a computer science competition, and that inanimate objects such as 50-year-old records could participate! Fascinating!
Beating one does not seem that hard, but then, those old records probably can do better programming than most managers anyway.
So, kudos to Deep Mind, I guess?
Magic mushrooms. Is it worth trying? (Score:1)

by jimmygal ( 9340383 ) writes:

Magic mushrooms. Is it worth trying? Recently, I found this internet store and I was interested in various products with magic mushrooms https://www.shrooms-online.net... [shrooms-online.net]. I would be very interested to try their effect on me, but I don't know how my brain will perceive it. Does anyone have any experience with magic mushrooms?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Punching Nazis is Legal (Score:1)

Re: (Score:2)

Meaningless (Score:1)

Re:Meaningless (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Meaningless (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Meaningless (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Data Cache law #1 (Score:2)

Re:Meaningless (Score:4, Insightful)

Optimizing costs too (Score:2)

An Apple watch (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Not Meaningful (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

50 years to optimize 4x4 matrix multiplication? (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: 50 years to optimize 4x4 matrix multiplication (Score:3)

Re: 50 years to optimize 4x4 matrix multiplicatio (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Can it play Doom? (Score:2)

no let's play global thermonuclear war (Score:2)

Re: Can it play Doom? (Score:2)

what kind of reward do they give an AI? (Score:1)

Reward & Punishment inspire Intelligence - (Score:2)

Re: (Score:2)

Re: (Score:2)

Shitty clickbait title, come on! (Score:2)

trades for addition and subtraction (Score:5, Interesting)

Please, have a heart! (Score:3)

Re: (Score:2)

Deceptive. (Score:1)

Re: (Score:2)

Probably not actually useful (Score:5, Insightful)

Please mod up (Score:2)

Re: (Score:2)

Genetic algorithms? (Score:2)

Re: Genetic algorithms? (Score:2)

Re: (Score:2)

Re: (Score:2)

Number of atoms (Score:3)

Re: (Score:2)

Re: (Score:2)

It's been done better before (Score:4, Informative)

AI Backed Search is a good future (Score:1)

From your friendly misunderstanding department (Score:2)

Magic mushrooms. Is it worth trying? (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals