About the Project
This article explores how Intel®'s Converged Edge Media Platform (Reference Architecture), combined with Varnish Enterprise and Kubernetes, leverages Intel® hardware optimizations to maximize efficiency and performance. Specifically, we demonstrate how using SR-IOV can achieve identical high-performance results with reduced CPU usage. Through performance testing, we compare baseline and optimized setups to highlight the tangible benefits of Intel® hardware and Kubernetes optimizations.
Intel®’s Converged Edge Media Platform Architecture is a media-focused implementation of Intel®’s Edge Platform. It provides container-based, cloud-native capabilities that enable Communication Service Providers (CoSPs) to quickly, efficiently, and cost-effectively deploy multiple media services. This architecture capitalizes on the fast-growing edge computing market by offering a platform tailored to media workloads. Built on Kubernetes (K8s) and leveraging Intel® hardware, it is optimized for the network edge and capable of hosting multiple services.
Kubernetes Cluster Configuration
The Kubernetes cluster is enhanced with the following key features:
- Device Manager: Enables direct access to Intel® GPUs from Pods.
- SR-IOV Plugin: Enhances network performance by enabling virtual NICs on a single physical NIC.
What is SR-IOV?
SR-IOV (Single Root I/O Virtualization) is a technology that enhances network performance in virtualized environments. It allows a single physical NIC to appear as multiple virtual NICs, giving virtual machines or containers direct access to NIC hardware. This reduces latency and increases throughput, making it ideal for high-performance workloads.
Cluster Specifications
- Kubernetes Cluster
- cluster1-master
- CPU: Intel® Xeon® Platinum 8358 (32c/64t)
- NIC: Intel® Ethernet E810-C QSFP (100Gbps)
- SSD: Kingston 240GB SSD
- cluster1-node
- CPU: Intel® Xeon® Platinum 8380 (40c/80t)
- NIC: Intel® Ethernet E810-C QSFP (100Gbps)
- SSD: Kingston 240GB SSD
- SSD: 3x Intel® NVMe 1.8TB
- cluster1-gpu
- CPU: Intel®Xeon® Gold 6346 (16c/18t)
- NIC: Intel® Ethernet E810-C QSFP (100Gbps)
- SSD: 4x Intel® NVMe 1.8TB
- GPU: 2x Intel®Data Center GPU Flex 170
- cluster1-master
In a Kubernetes (K8s) cluster, master nodes and worker nodes serve distinct roles, each critical to the operation of the cluster.
Master Node: Responsible for managing the Kubernetes cluster and orchestrating all activities across the worker nodes.
Worker Node: Executes workloads (pods) assigned by the master node.
The Tests
Test Setup
The testing process is divided into two phases:
- Baseline Phase: Tests without acceleration to establish baseline performance metrics.
- Optimized Phase: Tests with SR-IOV and other optimizations to measure performance improvements.
Key elements shared across both phases:
- Varnish Configuration: Origin servers in the cluster are configured for performance testing.
- Load Testing: Conducted using WRK from a high-performance node provided by Intel®.
Phase 1: Baseline
- Varnish Enterprise is deployed on the cluster using a Helm chart without any acceleration.
Phase 2: Optimized
- Intel® configures SR-IOV to provide the Varnish Pod with direct NIC access.
- NIC affinity is set to align with the NUMA node closest to the NIC for optimal performance via Intel edge media plugins.
- Varnish Enterprise is redeployed using a Helm chart, now with SR-IOV annotations.
Origin Setup
For the tests, the prng VMOD is used to generate 1MB synthetic responses. Here is the VCL configuration:
Test Plan
Load generation is performed using WRK to simulate traffic and measure performance with and without SR-IOV. We limited the number of CPUs in the values.yaml
file. We wanted to test how low we could go in terms of CPU usage to saturate the NIC.
The values.yaml
file in a Helm chart is a YAML file that contains default configuration values for the chart. It allows you to customize the behavior and resources of a Kubernetes application when deploying it with Helm. These values can be overridden by providing your own values.yaml
file or by using --set
flags during the helm install
or helm upgrade
commands.
To learn more about the Varnish Helm chart, you can click here.
In the values.yaml
, we added:
Test Results Summary
Phase 1: Without SR-IOV
Using 8 CPUs, the NIC throughput reached its maximum capacity (100Gbps).
To ensure a fair comparison between the two phases, we are now limiting the setup to 4 CPUs.
Threads |
Connections |
Req/sec |
Transfer/sec (GB/s) |
90% Latency (ms) |
99% Latency (ms) |
Hit Rate |
Errors (Timeout) |
4 |
20 |
5849.92 |
5.71 |
11.42 |
23.57 |
100% |
0 |
8 |
40 |
5393.02 |
5.27 |
40.76 |
53.63 |
100% |
0 |
16 |
60 |
4692.99 |
4.69 |
53.30 |
68.73 |
100% |
0 |
32 |
120 |
3416.72 |
3.34 |
84.26 |
107.04 |
100% |
0 |
64 |
240 |
1593.56 |
1.56 |
290.28 |
485.39 |
100% |
0 |
128 |
480 |
1716.83 |
1.68 |
514.99 |
929.11 |
100% |
0 |
We observe a decline in requests per second and a rise in latency as CPU congestion builds, leading to client performance degradation.
Phase 2: With SR-IOV
With SR-IOV enabled, the same NIC throughput of 10.94GB/s and request rate of 11195.59 req/sec are achieved, comparable to Phase 1 results (10.24GB/s throughput and 10485.96 req/sec) but with significantly fewer CPUs (5 CPUs versus 8 CPUs). This demonstrates that SR-IOV enables us to do the same with less CPU.
Limiting the setup to 4 CPUs:
Threads |
Connections |
Req/sec |
Transfer/sec (GB/s) |
90% Latency (ms) |
99% Latency (ms) |
Hit Rate |
Errors (Timeout) |
4 |
20 |
7566.83 |
7.39 |
3.99 |
8.31 |
100% |
0 |
8 |
40 |
8232.02 |
8.04 |
21.25 |
33.79 |
100% |
0 |
16 |
60 |
8386.16 |
8.19 |
22.39 |
36.26 |
100% |
0 |
32 |
120 |
7700.7 |
7.52 |
40.69 |
62.71 |
100% |
0 |
64 |
240 |
6393.45 |
6.25 |
73.59 |
127.94 |
100% |
0 |
128 |
480 |
5557.95 |
5.43 |
124.02 |
265.08 |
100% |
0 |
Observations and Comparison
- Performance Efficiency: With SR-IOV, all metrics (requests per second, transfer rate, and latency) improve significantly across all configurations.
- Latency Reduction: 90% and 99% latency metrics are consistently lower in the SR-IOV setup.
- Throughput Improvement: Transfer rates increased by up to ~30% in the SR-IOV setup compared to the baseline.
Conclusion
The results underscore the substantial performance gains achieved by integrating SR-IOV and Intel® hardware optimizations into the Kubernetes infrastructure. This approach not only boosts throughput but also significantly reduces CPU usage, making it an optimal solution for edge computing workloads. It is particularly suited for use cases that demand low latency or involve processing large datasets that are unsuitable for public cloud environments. When tailored for the network edge, these use cases are often video-related. With Varnish, the CPU savings can be repurposed to further enhance edge computing capabilities.
Authors
Gang Shen, Software Architect, Intel
Brandon Gavino, Product Solution Architect, Intel
Arindam Saha, Product Owner, Intel