Running AFM-4.5B on Intel CPUs with OpenVINO
Unlock the full potential of your Intel CPU server processor! In this demo, we transform an Amazon EC2 r8i instance equipped with the powerful Intel Xeon 6 (Granite Rapids) processor into a local AI powerhouse, running cutting-edge language models like AFM-4.5B by Arcee AI with exceptional performance and efficiency.
First, we optimize the AFM-4.5B model using the OpenVINO toolkit, quantizing it to 4 bits and 8 bits, and then run inference with a simple Python example written using Hugging Face transformers and Optimum-Intel. Next, we install the OpenVINO Model Server to serve the optimized model, invoking it with both curl and the OpenAI Python client.
