Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Math Power

Researchers Claim New Technique Slashes AI Energy Use By 95% (decrypt.co) 17

Researchers at BitEnergy AI, Inc. have developed Linear-Complexity Multiplication (L-Mul), a technique that reduces AI model power consumption by up to 95% by replacing energy-intensive floating-point multiplications with simpler integer additions. This method promises significant energy savings without compromising accuracy, but it requires specialized hardware to fully realize its benefits. Decrypt reports: L-Mul tackles the AI energy problem head-on by reimagining how AI models handle calculations. Instead of complex floating-point multiplications, L-Mul approximates these operations using integer additions. So, for example, instead of multiplying 123.45 by 67.89, L-Mul breaks it down into smaller, easier steps using addition. This makes the calculations faster and uses less energy, while still maintaining accuracy. The results seem promising. "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products," the researchers claim. Without getting overly complicated, what that means is simply this: If a model used this technique, it would require 95% less energy to think, and 80% less energy to come up with new ideas, according to this research.

The algorithm's impact extends beyond energy savings. L-Mul outperforms current 8-bit standards in some cases, achieving higher precision while using significantly less bit-level computation. Tests across natural language processing, vision tasks, and symbolic reasoning showed an average performance drop of just 0.07% -- a negligible tradeoff for the potential energy savings. Transformer-based models, the backbone of large language models like GPT, could benefit greatly from L-Mul. The algorithm seamlessly integrates into the attention mechanism, a computationally intensive part of these models. Tests on popular models such as Llama, Mistral, and Gemma even revealed some accuracy gain on certain vision tasks.

At an operational level, L-Mul's advantages become even clearer. The research shows that multiplying two float8 numbers (the way AI models would operate today) requires 325 operations, while L-Mul uses only 157 -- less than half. "To summarize the error and complexity analysis, L-Mul is both more efficient and more accurate than fp8 multiplication," the study concludes. But nothing is perfect and this technique has a major achilles heel: It requires a special type of hardware, so the current hardware isn't optimized to take full advantage of it. Plans for specialized hardware that natively supports L-Mul calculations may be already in motion. "To unlock the full potential of our proposed method, we will implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design," the researchers say.

Researchers Claim New Technique Slashes AI Energy Use By 95%

Comments Filter:
  • by JamesTRexx ( 675890 ) on Wednesday October 09, 2024 @12:04AM (#64850295) Journal

    I thought CPUs were already made to do floating operations as effectively as possible, so what makes it different for LLM?

    Makes it sound as if instead of writing float multiplications in C code, it's faster to translate it into integer calculations.
    Yes, I do know sometimes it is faster to convert to whole numbers and then convert the reult into float in the end.

    • by gweihir ( 88907 )

      Simple, float is still more effort than integer, and it is so in multiple dimensions. That will never change.

    • Re:I'm suprised (Score:4, Interesting)

      by jonsmirl ( 114798 ) on Wednesday October 09, 2024 @12:16AM (#64850309) Homepage

      L-Mult works by defining a new floating point format which is less capable than the existing, standard one. This works because the new format has been specifically tailored to match the needs of current AI tensor math. The existing standard float format has more features than AI requires. Since this new format is less complex it can be implemented in fewer logic gates which is where the power savings comes from.

      I suspect it would be more fruitful to increase efforts on converting existing models to quantized integer models and just stick with the existing hardware. This also massively lowers the power consumption by replacing floating point instructions with integer ones.

      • I suspect the real purpose of all this is an attempt to prime the pump for specialized hardware sales - hardware someone affiliated with the researchers will soon release.

        • I suspect the real purpose of all this is an attempt to prime the pump for specialized hardware sales

          Big tech will only buy it if it works.

          If it works as promised, they'll make billions.

      • converting existing models to quantized integer models and just stick with the existing hardware.

        That's how Google's TPU already does it. It has 64k 8-bit integer multipliers.

        I don't know how this new technique differs.

    • I thought CPUs were already made to do floating operations as effectively as possible

      These calculations are not run on CPUs.

      AI has been using GPUs but is increasingly using custom silicon.

      so what makes it different for LLM?

      Much lower precision for starters. CPUs don't support FP16, FP12, or FP8.

    • I thought CPUs were already made to do floating operations as effectively as possible

      I think it depends on the CPU.

      I remember that PowerPC used to be faster with floating point operations than integer ones and Apple would occasionally suggest converting to floats from ints for various array operations. Conversely, when Apple switched to Intel, the opposite was true and Apple changed their suggestion.

  • it reduces the complexity of integers even more and thus saves 4000% more energy.
    You could do your training on a TI-84 running on a battery.

  • Of course my factory is more efficient on power. It uses 4 times the space!
  • I read the preprint. I didn't quite catch whether this algorithm would deliver speed, efficiency or energy savings if implemented on existing hardware, like consumer CPUs and GPUs. The paper ends with "we will implement hardware and API". Do they mean that existing hardware beats it unless hardware is built specifically with their algorithm in mind?

Work is the crab grass in the lawn of life. -- Schulz

Working...