Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Math AI

DeepMind Breaks 50-Year Math Record Using AI; New Record Falls a Week Later (arstechnica.com) 30

Last week, DeepMind announced it discovered a more efficient way to perform matrix multiplication, conquering a 50-year-old record. This week, two Austrian researchers at Johannes Kepler University Linz claim they have bested that new record by one step. Ars Technica reports: In 1969, a German mathematician named Volker Strassen discovered the previous-best algorithm for multiplying 4x4 matrices, which reduces the number of steps necessary to perform a matrix calculation. For example, multiplying two 4x4 matrices together using a traditional schoolroom method would take 64 multiplications, while Strassen's algorithm can perform the same feat in 49 multiplications. Using a neural network called AlphaTensor, DeepMind discovered a way to reduce that count to 47 multiplications, and its researchers published a paper about the achievement in Nature last week.

To discover more efficient matrix math algorithms, DeepMind set up the problem like a single-player game. The company wrote about the process in more detail in a blog post last week. DeepMind then trained AlphaTensor using reinforcement learning to play this fictional math game -- similar to how AlphaGo learned to play Go -- and it gradually improved over time. Eventually, it rediscovered Strassen's work and those of other human mathematicians, then it surpassed them, according to DeepMind. In a more complicated example, AlphaTensor discovered a new way to perform 5x5 matrix multiplication in 96 steps (versus 98 for the older method).

This week, Manuel Kauers and Jakob Moosbauer of Johannes Kepler University in Linz, Austria, published a paper claiming they have reduced that count by one, down to 95 multiplications. It's no coincidence that this apparently record-breaking new algorithm came so quickly because it built off of DeepMind's work. In their paper, Kauers and Moosbauer write, "This solution was obtained from the scheme of [DeepMind's researchers] by applying a sequence of transformations leading to a scheme from which one multiplication could be eliminated."

This discussion has been archived. No new comments can be posted.

DeepMind Breaks 50-Year Math Record Using AI; New Record Falls a Week Later

Comments Filter:
  • by RightwingNutjob ( 1302813 ) on Thursday October 13, 2022 @09:53PM (#62965023)

    Everyone knows that in $HighLevelLanguage, a matrix multiply is only one line of code!

    • You need to include 12 libraries.
      And add a lot of tabs to this single line of code.
      Or was it spaces? =/
      Never trust a programming language made by a Dutch bureacrat playing with Dutch taxpayers money during worktime.

  • The 'FBHHRBNRSSSHK' Algorithm. Rolls of the tongue.
    • by haruchai ( 17472 )

      The 'FBHHRBNRSSSHK' Algorithm. Rolls of the tongue.

      It sure does, if you're Bill the Cat, lead tongue for Deathtongue

  • Cue the nerds here saying how this is nothing and how they could have solved it in their sleep.
    • I could have solved this in my sleep.

      It was solved by an AI. It doesn't matter who presses the "start" button, or what they're doing while the software is running.

      You want to be smarter than all the nerds, but you didn't even figure out the context successfully.

  • Kasparov (Score:5, Interesting)

    by backslashdot ( 95548 ) on Thursday October 13, 2022 @10:47PM (#62965063)

    Kasparov was right. I remember some years ago, when he spoke at DEFCON, he said that computer+human assistance is better then either one alone (I guess that is fairly obvious). He also said said naive human + AI is better than smart human + AI. i suppose the smart human second guesses the AI in the wrong instances. Maybe a non-math/CS person ought to take a crack at this algorithm too.

  • Been done before (Score:5, Interesting)

    by MarkWegman ( 2553338 ) on Thursday October 13, 2022 @11:58PM (#62965159)
    Strassen showed that Matrix Multiply could be done in n^3 time. Coppersmith-Winograd using slightly different techniques showed that you could do even better than Strassen. DeepMind showed that they could beat Strassen's algorithm but not CW (and there've been improvements to CW). A human went back and beat the DeepMind algorithm using Strassen like techniques. But why bother when you can use CW techniques?

    BTW this is all theoretical and not practical. All of these techniques are numerically unstable so you wouldn't use them for reals, but only for matrix multiply over finite fields and they only really pay for humongous matrices.

    • Thereâ(TM)s value in working on enabling AI to assist in developing and optimizing algorithms. This shows a step towards that.

    • by Entrope ( 68843 )

      The CW matrix multiplication algorithm and its relatives have horrible constant factors that make them slower than Strassen for any matrix that is feasible to process on existing hardware. The algorithms presented in these papers have lower constant factors, so they're actually feasible to use -- when your matrix size has the right prime factors. (Strassen's algorithm has a smaller constant factor for its big-O time, so these all represent different points on a Pareto frontier. But if you read any of the

      • I thought about this... Say you have a 192 x 192 matrix. You can do 3x3 in the conventional way, then use the improved methods for 12x12 split into 4x4 smaller matrices, and for 48x48, and for 192x192. If you have a 191 x 191 matrix, you add an extra row and column with zeroes. Having a method for 5x5 will help because suddenly you can do a 100x100 matrix.

        You should probably measure the time for n x n matrices, using the classical method, using a faster method for 2 x 2, 4 x 4, or 5 x 5 (having a 3 x 3 m
    • I haven't tried it personally, but Coppersmith-Winograd seems to have extremely large overheads for "small" sizes - with "small" being so large that it doesn't beat Strassen for any problems that can realistically be solved at all.

      The methods for 4x4 and 5x5 should be easily integratable into Strassen's method. Having a non-power of two improves things for matrix sizes that are not a power of two, for example 100 x 100 which can be split into 96 20x20, each 20x20 into 96 4x4, and each 4x4 can be done wit
    • Re:Been done before (Score:4, Informative)

      by mesterha ( 110796 ) <<moc.liamg> <ta> <mrahretsem.sirhc>> on Friday October 14, 2022 @09:20AM (#62965847) Homepage

      Strassen showed that Matrix Multiply could be done in n^3 time.

      Naive matrix multiply is n^3. Strassen showed roughly O(n^2.8074). Currently the best algorithm has O(n^2.3728596) complexity but is impractical for realistic matrix sizes. All of these faster algorithms are based on Strassen's approach. See https://en.wikipedia.org/wiki/... [wikipedia.org]

      DeepMind showed that they could beat Strassen's algorithm but not CW (and there've been improvements to CW).

      I don't think DeepMind made this claim. All they showed is that they could solve some rather small matrices "faster" than current Strassen style decompositions. Their direct approach does not apply to arbitrary matrix multiply. Given the recursive nature of Strassen algorithms, I would guess these techniques might help, but I have not seen it reported.

      A human went back and beat the DeepMind algorithm using Strassen like techniques. But why bother when you can use CW techniques?

      It sounds like they previously developed a trick to shave off a multiply. They just applied their trick to the DeepMind solution. It probably required an insight that was not in the search space of the algorithm.

      BTW this is all theoretical and not practical. All of these techniques are numerically unstable so you wouldn't use them for reals, but only for matrix multiply over finite fields and they only really pay for humongous matrices.

      The naive multiply wins because it can be effectively parallelized with good cache performance.

  • while this is really a nice result, it is not surprising that a computer can fine-tune an idea by intelligently searching. Optimizing matrix multiplication required a genius initial idea (mostly Strassen) but once the idea is known, it is natural to fine-tune it with the help of a machine rather than trying to improve the algorithm by hand by modifying the idea. The optimal strategy for matrix multiplication is still not known. It is likely that a computer will get there. It will then however need an other
  • Can't this simply be brute forced for small matrixes?
    • by Shimbo ( 100005 )

      Can't this simply be brute forced for small matrixes?

      It blows up incredibly fast, such that the answer isn't definitively known for n > 2.

  • I know the point is the accomplishment itself, but is there any significant improvement to be had by reducing 96 multiplications to merely 95? Is there any GPU that's going to process 95 multiplies in a cycle but the 96th one just can't be shoehorned in? The process may be interesting for the mathematical insight, but it may not be worth the extra time to code if you can get within one or two multiplies of ideal while remaining "pretty" and easy to maintain.

    • by Voltara ( 6334 )

      The improvement works recursively. For example, you can multiply a 25x25 matrix by treating it as a 5x5 matrix of 5x5 matrices. So, instead of 96*96 (9216) integer multiplications, you have 95*95 (9025).

      The savings compound, so what was a ~1% improvement for 5x5 becomes ~5% for 3125x3125, and so on.

      • by Mal-2 ( 675116 )

        Thanks for the explanation. While you ideally never say no to extra speed, I don't think a 5% difference is going to be that important in most cases. By the time the savings compound that much, the system is going to run into other bottlenecks, unless the hardware is closely tuned to the load. I think for most people, the hardware is going to be one or more GPUs, meaning they're pretty far from tuned to the load although still immensely better than trying to do it on the CPU.

  • Does it use less clocks when parallelized?

    The number of linear (pun intended) steps is meaningless. The number of clocks matters.

He who has but four and spends five has no need for a wallet.

Working...