Intel has shared its latest MLPerf Inference v6.0 results, and the numbers clearly show how much progress it has made in AI workloads, especially with the new Arc Pro B70 and B65 GPUs that now deliver strong gains over the previous Arc Pro B60 while also improving performance on older hardware through software updates.
MLCommons published the MLPerf Inference v6.0 benchmarks, and Intel used the results to highlight its newest Arc Pro GPUs based on the Battlemage architecture along with Xeon 6 CPUs, showing how the company is building a full AI inference platform that combines both GPU and CPU performance.
Arc Pro B70 shows major performance gains
Intel tested a four GPU setup using Arc Pro B70 and B65 cards with up to 128 GB of VRAM, and this configuration handled large models such as 120B parameters while delivering up to 80% higher inference performance compared to the Arc Pro B60.
Here is how the GPUs performed in the GPT-OSS-120B benchmark:
| GPU Config | Offline (Tokens/s) | Server (Tokens/s) |
|---|---|---|
| 4 x Arc Pro B70 (128 GB) | 1536.90 | 951.67 |
| 4 x Arc Pro B60 Dual (192 GB) | 1601.91 | 884.24 |
| 4 x Arc Pro B60 (96 GB) | 841.04 | 452.19 |
The Arc Pro B70 delivers strong server performance, which matters for real time inference workloads where latency and throughput both impact results.
Performance across LLM benchmarks
Intel also tested other popular models, including Llama 2 and Llama 3.1, and the results show how different configurations scale depending on memory and workload type.
Llama2-70B-99 benchmark
| GPU Config | Offline (Tokens/s) | Server (Tokens/s) |
|---|---|---|
| 4 x Arc Pro B70 (128 GB) | 2459.18 | 1698.57 |
| 4 x Arc Pro B60 Dual (192 GB) | 3270.66 | 2199.50 |
| 4 x Arc Pro B60 (96 GB) | 1697.66 | 1106.26 |
Llama3.1 8B benchmark
| GPU Config | Offline (Tokens/s) | Server (Tokens/s) |
|---|---|---|
| 4 x Arc Pro B60 Dual (192 GB) | 52.83 | 49.17 |
| 4 x Arc Pro B70 (128 GB) | 36.07 | 32.58 |
| 4 x Arc Pro B60 (96 GB) | 26.15 | 24.57 |
| 4 x Arc Pro B50 (64 GB) | 13.45 | 9.27 |
| 2 x Xeon 6 (128 Cores) | 9.61 | 3.68 |
These results show that higher memory configurations still play a key role in some workloads, especially for larger models where VRAM capacity directly affects performance.
Software optimizations improve existing GPUs
Intel also improved performance on older GPUs, and the company reports an 18% uplift on the Arc Pro B60 through software optimizations alone, which means users already running these GPUs can see gains without upgrading hardware.
This focus on optimization also reflects how AI inference performance depends on both hardware and software working together, especially when running large language models.
Xeon 6 CPUs still play a key role
Intel also submitted Xeon 6 CPU results, showing up to 90% generation improvement with features like AMX and AVX-512 that help accelerate AI workloads even without dedicated GPUs.
The company also points out that CPUs handle critical tasks such as memory management, workload scheduling, and system reliability, which directly affects overall AI performance and total cost of ownership.
Intel positions the Arc Pro B70 as a strong option for AI inference with high VRAM capacity, multi GPU scaling, and enterprise features like ECC and remote management, while also keeping pricing competitive under $1000 for a single GPU.
These results show steady progress across both hardware and software, and they confirm that Intel continues to improve its AI stack across GPUs and CPUs, especially as demand for large model inference keeps growing.