Decentralizing Generative AI (GenAI) Inference On Device — White Paper

Download

In Collections:

Data Center Content Library IDZ Technical Library

ID 850477

Date 2025-03-25

Public

Description

Bringing AI inference closer to the data source offers significant advantages in cost, privacy and performance. Recent advancements in lightweight GenAI models (i.e., 1-8B parameters) provide a disruptive opportunity to shift GenAI deployment from the cloud to the edge, but alternatives to cloud-based GenAI need to be practical and efficient. This white paper outlines a strategic approach to shift GenAI deployments from cloud-native (i.e., GPU based) solutions to edge (i.e., hardware based) solutions using the built-in compute acceleration of CPU-GPU-NPU (e.g., Intel® Core™ Ultra processors, Intel® Arc™ GPUs) and open source GenAI models. On-device deployment offers low total cost of ownership (TCO), offline capabilities, data sovereignty and reduced latency, making powerful GenAI models accessible across regions and sectors that may previously have faced barriers to deployment.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Decentralizing Generative AI (GenAI) Inference On Device — White Paper

Description

Usage instructions