ARM has unveiled the first port of its KleidiAI software, which enables running AI on CPUs instead of GPUs. The Kleidi open source microkernel is specifically optimized for ARM processor cores, making it easy to integrate into C or C++ machine learning (ML) and AI frameworks. Developers can include standalone .c and .h files associated with specific micro-kernels along with a common header file, without any dependencies on external libraries, dynamic memory allocation, or memory management.
Moreover, the microkernels provided by Kleidi offer specialized fusion patterns and a stateless, stable, and consistent API. According to Ronan Naughton, director of Product Management at ARM, the launch of Kleidi in May 2024 marked a significant milestone in accelerating AI adoption across the developer ecosystem. The Kleidi Libraries for popular AI frameworks, featuring KleidiAI, are part of this initiative.
The integration of KleidiAI with frameworks like Mediapipe from Google, utilizing the XNNPack on CPUs, has shown promising results. For instance, it has improved the time to first token by 30% for the Gemma framework with 2 billion parameters, achieving 250 tokens/s for text summarization on a Samsung S24 smartphone equipped with the Exynos 2400 SoC.
This advancement is crucial for enabling the deployment of large language models on smartphones and at the network edge, as emphasized by discussions with ARM's Paul Williamson on the Ethos-U85 accelerator core. It is expected to play a pivotal role in ARM's AI strategy following the acquisition of AI chip designer Graphcore by Softbank, ARM's parent company.
ARM is actively collaborating with various AI frameworks to seamlessly integrate KleidiAI, eliminating the need for developers to acquire additional tools or skills. This streamlined approach empowers developers to enhance the performance of AI-based applications efficiently. Companies like Google, Meta, and Samsung Mobile are exploring the potential of KleidiAI across diverse markets.
The demonstration leveraging MediaPipe APIs and the XNNPACK CPU backend, accelerated by KleidiAI integration, is a significant milestone. With over 7 billion third-party installs, XNNPACK offers a vast market reach for KleidiAI integration. Matthias Grundmann, Google AI Edge Lead, expressed excitement about supporting KleidiAI in Google AI Edge's XNNPACK to boost AI workloads on current and future Arm CPUs.
KleidiAI is compatible with ARM CPUs utilizing architectural features such as Neon, SVE2, and Scalable Matrix Extension found in A and X-class devices. This compatibility enables the development of portable software solutions for application developers, enhancing the accessibility and performance of AI applications.
The technical demo of KleidiAI is available on Gitlab. For more information, visit www.arm.com.