Running AFM-4.5B on Intel CPUs with OpenVINO

Sep 11, 2025

Unlock the full potential of your Intel CPU server processor! In this demo, we transform an Amazon EC2 r8i instance equipped with the powerful Intel Xeon 6 (Granite Rapids) processor into a local AI powerhouse, running cutting-edge language models like AFM-4.5B by Arcee AI with exceptional performance and efficiency.

First, we optimize the AFM-4.5B model using the OpenVINO toolkit, quantizing it to 4 bits and 8 bits, and then run inference with a simple Python example written using Hugging Face transformers and Optimum-Intel. Next, we install the OpenVINO Model Server to serve the optimized model, invoking it with both curl and the OpenAI Python client.

The AI Realist

Ready for more?