Find ASIC Vendors

Intel showcases AI inference scalability in MLPerf v6.0

April 08, 2026

Get a Price Quote

Intel has recently unveiled new benchmark results showcasing the latest advancements in its AI inference capabilities across CPUs and GPUs. These results, which are part of the MLPerf Inference v6.0 suite from MLCommons, shed light on the performance of Intel Xeon 6 processors when paired with Intel Arc Pro B-Series GPUs for deployment in workstations, datacenters, and edge environments.

For engineers and developers closely monitoring the evolution of AI hardware platforms, these results provide valuable insights into how Intel is positioning itself in a market that is currently dominated by proprietary GPU stacks.

Benchmark Results Demonstrate Arc Pro GPU Scaling

The MLPerf Inference v6.0 results feature four key benchmarks for Intel GPU systems utilizing Intel Xeon 6 CPUs in conjunction with Intel Arc Pro B70 graphics. Intel reports that a system equipped with four Arc Pro B70 or B65 GPUs offers 128GB of VRAM, enabling the execution of large AI models containing up to 120 billion parameters with high concurrency.

According to the company, the Arc Pro B70 GPU delivers up to 1.8× higher inference performance compared to the previous-generation Arc Pro B601.

Intel has also emphasized the enhancements achieved through software optimization. By leveraging an open, containerized software stack, inference performance can be scaled from a single-node system to enterprise multi-GPU deployments. The company claims up to 1.18× performance gains on the same Arc Pro B60 hardware compared to MLPerf v5.12.

“The combination of Intel Xeon 6 and Intel’s Arc Pro B-Series GPUs represents our commitment to expanding customer choice and value, providing real-world solutions that cater to both LLM models and traditional machine learning workloads. This offering delivers leading performance and exceptional value for graphics professionals and AI developers worldwide,” stated Anil Nanduri, Intel vice president, AI Products and GTM, Intel Data Center Group.

Targeting AI Workstations and Edge Deployments

Intel’s GPU systems featuring Arc Pro B70 and B65 GPUs have been designed as integrated inference platforms that combine hardware and software validation. These systems are intended to streamline AI deployment through a Linux-based containerized environment with multi-GPU scaling and PCIe peer-to-peer data transfers.

The GPUs also incorporate enterprise features such as ECC memory, SR-IOV virtualization support, telemetry, and remote firmware updates.

Memory capacity plays a crucial role in running large language models, and Intel asserts that the Arc Pro B70 is capable of handling larger models and context windows in multi-GPU configurations. In comparison to similar competing GPUs, the company claims that the B70 can support up to 1.6× greater key-value cache capacity when running large models.

Xeon CPUs Remain Integral to Inference Systems

Despite the focus on GPUs for AI acceleration, Intel underscores the significance of CPUs in inference infrastructure. The host CPU is responsible for memory management, workload orchestration, and overall cluster efficiency, all of which directly impact system cost and performance.

Intel stands out as the sole server processor vendor submitting stand-alone CPU results to the MLPerf inference benchmarks. More than half of the submissions for MLPerf Inference v6.0 featured Xeon processors as the host CPU.

The company also highlights the generational improvements in its CPU roadmap. Intel Xeon 6 processors equipped with performance cores (P-cores) achieved up to a 1.9× performance enhancement in MLPerf Inference v5.1 compared to the previous generation. Built-in AI acceleration technologies like AMX and AVX-512 enable workloads, including LLM inference, fine-tuning, and classical machine learning, to run efficiently even without dedicated accelerator hardware.

With the increasing demand for AI inference across various platforms, including edge devices, workstations, and datacenters, vendors are increasingly focusing on scalable architectures that combine CPUs, GPUs, and optimized software stacks. Intel’s latest MLPerf results underscore its strategy to compete through open platforms and system-level performance rather than relying on proprietary AI infrastructure.