This shows the power of Intel® Dynamic Load Balancer in service mesh.

Intel® Dynamic Load Balancer (Intel® DLB) is a hardware-managed system of queues and arbiters connecting producers and consumers. It is a PCI device envisaged to live in the server CPU uncore that can interact with software running on cores and potentially with other devices.

Loong Dai

Intel DLB implements the following load balancing features:

Offloads queue management from software:
- Improves multi-producer / multi-consumer scenarios and enqueue batching to multiple destinations.
- Intel DLB implements lockless access to shared queues. This removes the need for overhead locks when accessing shared queues in the software.
Dynamic, flow aware load balancing and reordering:
- Ensures equal distribution of tasks and better CPU core utilization. Can provide flow-based atomicity if required.
- Distributes high bandwidth flows across many cores without loss of packet order.
- Better determinism and avoids excessive queuing latencies.
- Uses less IO memory footprint and saves DDR Bandwidth.
Priority queuing (up to 8 levels) —allows for QoS:
- Lower latency for traffic that is latency sensitive.
- Optional delay measurements in the packets.
Scalability:
- Allows dynamic sizing of applications, with seamless scaling up/down.
- Power aware; application can drop workers to lower power state in cases of lighter loads.

There are three types of load balancing queues:

Unordered: For multiple producers and consumers, where the order of tasks is not important. Each task is assigned to the processor core with the lowest current load.
Ordered: For multiple producers and consumers, where the order of tasks is important. When multiple tasks are processed by multiple processor cores, they must be rearranged in the original order.
Atomic: For multiple producers and consumers, where tasks are grouped according to certain rules. These tasks are processed using the same set of resources and the order of tasks within the same group is important.

How Intel DLB accelerates Linkerd2

Intel DLB accelerates Linkerd2 by accelerating Tokio, which is Linkerd2's async runtime written in Rust.

Rust currently provides only the essentials for writing async code. Rust has very strict backward compatibility requirements and a specific runtime for Rust standard library has not been chosen. Along comes Tokio, which gets the biggest support from the community and has many sponsors.

Tokio is generic, reliable, easy to use, and flexible for most--but not all cases because of its scheduler.

How Tokio implements its scheduler

Tokio’s scheduler is modeled on a work-stealing scheduler.

As shown in Figure 1 above, in a work-stealing scheduler, (1) each processor spawns tasks, puts them in its own queue, and runs them. If the queue is empty, (2) the processor tries to steal from other threads.

The scheduling overhead is from synchronization. To reduce cost, CAS (compare and swap) is a common solution, but CAS cannot perfectly scale with core count.

Although scheduling overhead only occurs when it tries to “steal”, it is hard to balance the workload of all processors, which leads to high tail latency in high traffic cases.

How Intel Dynamic Load Balancer helps Tokio

Intel DLB can be a lockless multiple-producer and multiple-consumer queue. In this scenario, we replaced the Tokio scheduler with Intel DLB

Figure 2 above shows:

Threads spawn tasks.
Threads send tasks to Intel DLB.
Threads are notified by Intel DLB to get tasks. Then, it puts the tasks into its own queue and runs them.

In this way, the workload of all threads can be balanced by Intel DLB and perfectly scaled with core count.

How to deploy the benchmark

The best case for Intel DLB-enabled Tokio is high traffic, like ingress. Since Linkerd2 should work with existing ingress solutions such as Nginx Ingress, we deploy the benchmark as shown below:

*Figure 3 Deploying the Kubernetes Benchmark*

In Figure 3, we compared the baseline of pure Linkerd2-Proxy to the target of Linkerd2-Proxy plus Intel DLB.

The benchmark shows that the request per second has been greatly improved and the latency has been reduced.

Summary

In this article, we showed how Intel DLB helps balance scheduling loads. With this feature, we can get a significant performance improvement in high traffic scenarios in the cloud native service mesh world. Also, it shows the power of the combination of hardware and software.

Related resources

Intel DLB driver: https://www.intel.com/content/www/us/en/download/686372/intel-dynamic-load-balancer.html

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in