Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Math Power

Researchers Claim New Technique Slashes AI Energy Use By 95% (decrypt.co) 24

Researchers at BitEnergy AI, Inc. have developed Linear-Complexity Multiplication (L-Mul), a technique that reduces AI model power consumption by up to 95% by replacing energy-intensive floating-point multiplications with simpler integer additions. This method promises significant energy savings without compromising accuracy, but it requires specialized hardware to fully realize its benefits. Decrypt reports: L-Mul tackles the AI energy problem head-on by reimagining how AI models handle calculations. Instead of complex floating-point multiplications, L-Mul approximates these operations using integer additions. So, for example, instead of multiplying 123.45 by 67.89, L-Mul breaks it down into smaller, easier steps using addition. This makes the calculations faster and uses less energy, while still maintaining accuracy. The results seem promising. "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products," the researchers claim. Without getting overly complicated, what that means is simply this: If a model used this technique, it would require 95% less energy to think, and 80% less energy to come up with new ideas, according to this research.

The algorithm's impact extends beyond energy savings. L-Mul outperforms current 8-bit standards in some cases, achieving higher precision while using significantly less bit-level computation. Tests across natural language processing, vision tasks, and symbolic reasoning showed an average performance drop of just 0.07% -- a negligible tradeoff for the potential energy savings. Transformer-based models, the backbone of large language models like GPT, could benefit greatly from L-Mul. The algorithm seamlessly integrates into the attention mechanism, a computationally intensive part of these models. Tests on popular models such as Llama, Mistral, and Gemma even revealed some accuracy gain on certain vision tasks.

At an operational level, L-Mul's advantages become even clearer. The research shows that multiplying two float8 numbers (the way AI models would operate today) requires 325 operations, while L-Mul uses only 157 -- less than half. "To summarize the error and complexity analysis, L-Mul is both more efficient and more accurate than fp8 multiplication," the study concludes. But nothing is perfect and this technique has a major achilles heel: It requires a special type of hardware, so the current hardware isn't optimized to take full advantage of it. Plans for specialized hardware that natively supports L-Mul calculations may be already in motion. "To unlock the full potential of our proposed method, we will implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design," the researchers say.

Researchers Claim New Technique Slashes AI Energy Use By 95%

Comments Filter:
  • by JamesTRexx ( 675890 ) on Wednesday October 09, 2024 @12:04AM (#64850295) Journal

    I thought CPUs were already made to do floating operations as effectively as possible, so what makes it different for LLM?

    Makes it sound as if instead of writing float multiplications in C code, it's faster to translate it into integer calculations.
    Yes, I do know sometimes it is faster to convert to whole numbers and then convert the reult into float in the end.

    • by gweihir ( 88907 )

      Simple, float is still more effort than integer, and it is so in multiple dimensions. That will never change.

    • Re:I'm suprised (Score:5, Interesting)

      by jonsmirl ( 114798 ) on Wednesday October 09, 2024 @12:16AM (#64850309) Homepage

      L-Mult works by defining a new floating point format which is less capable than the existing, standard one. This works because the new format has been specifically tailored to match the needs of current AI tensor math. The existing standard float format has more features than AI requires. Since this new format is less complex it can be implemented in fewer logic gates which is where the power savings comes from.

      I suspect it would be more fruitful to increase efforts on converting existing models to quantized integer models and just stick with the existing hardware. This also massively lowers the power consumption by replacing floating point instructions with integer ones.

      • I suspect the real purpose of all this is an attempt to prime the pump for specialized hardware sales - hardware someone affiliated with the researchers will soon release.

        • I suspect the real purpose of all this is an attempt to prime the pump for specialized hardware sales

          Big tech will only buy it if it works.

          If it works as promised, they'll make billions.

      • converting existing models to quantized integer models and just stick with the existing hardware.

        That's how Google's TPU already does it. It has 64k 8-bit integer multipliers.

        I don't know how this new technique differs.

        • by pjt33 ( 739471 )

          Using the implicit leading 1, a floating point number is stored as (s, m, e) representing (-1)^s (1+m) 2^e. The interesting part of multiplication is (1+m1)(1+m2)=(1+m1+m2+m1 m2). They approximate this as (1.0625+m1+m2). I assume that they work on the basis that the system is robust enough that the error doesn't matter, because naively I would think that the way to optimise it would be to do the multiplication m1 m2 in lower precision (e.g. using the leading 4 bits of each mantissa).

        • Well I've skimmed the paper. From what I can tell, the method approximates floating point multiplication, using almost only addition. This is possible because floating point is a semi-logarithmic format, so adding the exponents is a good part of multiplication.

          Basically, an fp number is (1+x/M)*2^e

          x is an integer 0xM, M is a constant (a power of 2), e is an integer. If you multiply 2 fp numbers (x,e and y,f) you get:

          (1 + x/M + y/M + x*y/M^2) * 2^(e+f)

          there's some sign bits and extra fiddling so if the fir

    • I thought CPUs were already made to do floating operations as effectively as possible

      These calculations are not run on CPUs.

      AI has been using GPUs but is increasingly using custom silicon.

      so what makes it different for LLM?

      Much lower precision for starters. CPUs don't support FP16, FP12, or FP8.

    • I thought CPUs were already made to do floating operations as effectively as possible

      I think it depends on the CPU.

      I remember that PowerPC used to be faster with floating point operations than integer ones and Apple would occasionally suggest converting to floats from ints for various array operations. Conversely, when Apple switched to Intel, the opposite was true and Apple changed their suggestion.

  • it reduces the complexity of integers even more and thus saves 4000% more energy.
    You could do your training on a TI-84 running on a battery.

    • by Anonymous Coward

      Oh look, it's mister fancy pants over here with his TI-84. In my day we did it the real way, raw-dogging it with a TI-83.

      • Oh look, it's mister fancy pants over here with his TI-84. In my day we did it the real way, raw-dogging it with a TI-83.

        Amateurs. The only real way is a Casio calculator watch!

  • Of course my factory is more efficient on power. It uses 4 times the space!
  • I read the preprint. I didn't quite catch whether this algorithm would deliver speed, efficiency or energy savings if implemented on existing hardware, like consumer CPUs and GPUs. The paper ends with "we will implement hardware and API". Do they mean that existing hardware beats it unless hardware is built specifically with their algorithm in mind?

    • by gweihir ( 88907 )

      It is hype. Expect most of the claims to be empty and the rest to be misleading.

    • I didn't quite catch whether this algorithm would deliver speed, efficiency or energy savings if implemented on existing hardware

      No. It requires new silicon.

      like consumer CPUs and GPUs.

      Nobody uses consumer CPUs and GPUs for AI anymore. H100 tensor core GPUs are used, but most big tech companies are developing their own custom silicon designs.

  • Using integer maths instead of floating point maths where you want a similar result, but don't need as high precision has been done for multiple decades already and is a common thing to do in e.g. resource constrained embedded devices. Applying it to AI models doesn't magically make it a new thing and I seem to recall having seen several articles over the years from other groups on doing it to AI models/training before this.

    This just smells like an attempt at hyping things up in the hopes of grant money or

  • It only reduces compute energy usage, but memory access is the real bottleneck. It's a deceiving title

Thrashing is just virtual crashing.

Working...