Find ASIC Vendors

ZeroPoint compression boosts AI memory by 50%

February 24, 2025

Get a Price Quote

ZeroPoint Technologies AB, based in Gothenburg, Sweden, has unveiled a groundbreaking hardware-assisted compression technology designed to enhance the effective memory of foundational AI models, such as large language models (LLMs), by up to 50 percent. The company, specializing in compression solutions, was established in 2015 by Professor Per Stenström and Angelos Arelakis as a spin-off from Chalmers University of Technology.

The innovative product, named AI-MX, facilitates the compression and decompression of deployed foundational models, including large language models (LLMs). ZeroPoint plans to make the product available to customers and partners in the latter half of 2025, offering a significant advancement in memory optimization for AI applications.

By leveraging AI-MX, enterprise and hyperscale datacenters can enhance various performance metrics, such as effective addressable memory, memory bandwidth, and tokens served per second, according to ZeroPoint. The company's proprietary hardware-accelerated compression, compaction, and memory management technologies deliver operations at nanosecond latencies, surpassing the speed of traditional compression algorithms by a factor of 1,000.

AI-MX is compatible with a wide range of memory types, including HBM, LPDDR, GDDR, and DDR, ensuring that the benefits of memory optimization extend to most AI acceleration use cases. Klas Moreau, CEO of ZeroPoint Technologies, emphasized the significance of the new solution, stating, "With today's announcement, we introduce a first-of-its-kind memory optimization solution that has the potential to save companies billions of dollars per year related to building and operating large-scale datacenters for AI applications."

ZeroPoint Technologies is committed to further enhancing the capacity and performance of the AI-MX product in future iterations, aiming to exceed the 1.5x increases in subsequent generations. The company's cutting-edge hardware-accelerated technologies operate at ultra-low nanosecond latencies, enabling them to outperform traditional compression algorithms by more than 1,000 times.