Featured image for ML with OpenShift

Graphics Processing Units (GPUs) have played a critical role in speeding up computational tasks in the rapidly developing fields of machine learning and artificial intelligence, especially in the areas of model training and inference. Although NVIDIA GPUs have historically dominated this market, Intel has become a strong contender with a range of GPUs made to handle a variety of workloads, including machine learning.

The rise of Intel GPUs in machine learning

Intel GPUs, developed by Intel Corporation, are a key component in Intel's strategy to provide comprehensive solutions for AI workloads. Designed to complement traditional Central Processing Units (CPUs), Intel GPUs bring parallel processing power to the table, enhancing the performance of machine learning tasks and paving the way for efficient model serving.

Key objectives of Intel GPU integration

The integration of Intel GPUs into the machine learning ecosystem aims to address several critical objectives:

  • Performance enhancement: Intel GPUs are engineered to deliver accelerated parallel processing, significantly boosting the speed and efficiency of model inference, a crucial aspect of deploying machine learning models in real-world applications.
  • Energy efficiency: With an emphasis on optimizing power consumption, Intel GPUs offer a balance between performance and energy efficiency, making them well-suited for a wide range of deployment scenarios, from cloud-based solutions to edge computing devices.
  • Diversification of options: As the demand for machine learning accelerates across industries, the availability of diverse GPU options becomes essential. Intel's presence in the GPU market provides users with alternatives, fostering competition and innovation.

Using Intel GPUs in OpenVINO Model Server and the benefits

OVMS is designed to optimize and accelerate the deployment of deep learning models on Intel hardware, including Intel CPUs, GPUs, and accelerators.

Integration of Intel GPUs with OpenVINO Model Server (OVMS) brings about notable advantages in the realm of deep learning and artificial intelligence. By harnessing the capabilities of Intel GPUs, OVMS optimizes the inference process, unlocking benefits such as accelerated model performance, low latency for real-time applications, efficient parallel processing, and the support for distributed inference across multiple GPUs. This synergy enhances the overall efficiency and throughput of deep learning workloads, making the combination of Intel GPUs and OVMS a powerful solution for AI practitioners and developers.

Use of Intel GPU in OpenVINO Model Server

Accelerate  model inference

Intel GPUs are leveraged within OVMS to accelerate the inference speed of deep learning models. While this acceleration is generally beneficial for real-time applications, it's important to note that the impact on latency depends on the size of the model. For larger models, the GPU acceleration significantly improves performance. However, for smaller models, there may be a tradeoff in terms of the time required to copy data to and from the GPU. Notably, Intel's Neural Compressor can shrink model requirements, potentially resulting in lower latency when the models reside on the CPU, especially for models under a certain size.

Parallel processing power

Intel GPUs excel in parallel processing, specifically accelerating matrix math operations essential for deep learning models. OVMS efficiently utilizes this parallelism to process multiple inference requests concurrently, enhancing overall throughput.

Compatibility and integration

OVMS is designed to seamlessly integrate with Intel GPUs, ensuring compatibility and optimized performance. This integration allows users to deploy models on Intel GPU infrastructure without significant modifications to their existing workflows.

Distributed inference

When Intel GPUs collaborate with OVMS for distributed inference, multiple OVMS instances on different nodes combine, unlocking the collective GPU power across the network. A single OVMS instance, limited to one hardware node, synergizes without GPUs directly working across the network. This approach efficiently handles large inference tasks, making the magic of multiple instances a key factor.

Energy efficiency

Intel GPUs often provide a balance between high-performance computing and energy efficiency. When deploying models with OVMS on Intel GPU hardware, organizations can benefit from improved performance per watt, contributing to overall energy efficiency in data center environments.

Last updated: February 21, 2024