Deep Dive Into the Architecture of a Pipeline Stage

What good is a scheduler if it has only one task and if that one task goes waiting? If you hit a dependency with the current working set, you need to be able to work with “other ready” sets. For that, you first have to have “other sets” in the hopper. For DPDK packet processing, the key take away from this scheduler analogy of tasks to packets is that we need to fetch a bunch of packets rather than just one. The goal is to achieve parallelism with potentially many “working sets”, which also provides amortization of the overhead of I/O across multiple packets. Same as a scheduler, here the packet pipeline keeps going until it gets to a “waiting” or “dependency" condition. The need for data from memory can be considered the “heavy weight hammer”. Learn how to optimize packet processing with DPDK, using the "thirsty dinosaur" analogy.

Check out the companion video: How Developers Can Benefit from the DPDK Packet Framework

For more information:

Visit DPDK.org to learn more about DPDK

Packet Framework section of DPDK Programmer’s Guide

Subscribe to the Intel® Software YouTube channel

To learn more about DPDK

Packet Framework section of DPDK Programmer’s Guide

Subscribe to the Intel® Software YouTube channel

Hi, I'm MJ from Intel. In this video, we will take a deep dive into the architecture of the pipeline stage.

To take advantage of the current superscalar processor architecture data level parallelism is key. I'm sure you will agree that with only one packet at hand you hardly can exercise any parallelism. With only one packet, you will sequentially build dependency and end up waiting. This is no different than any effective scheduler. What good is a scheduler, if it does only one task and if that one task was waiting?

Similarly, if your dependency with the current working set, you need to be able to work with other ready sets. For that, you're forced to have other sets in the hopper. To ensure you will have multiple ready sets, you want to go and get a bunch of packets in the input queue.

The key take away from the scheduler of packets to tasks is that we need to fetch a bunch of packets. Thus as you see here, we will have packets 0 to 9 all fetched at once. Not only have you achieved parallelism with potentially many working cells, as a very important byproduct, you also have achieved a marked position of the overhead available across multiple packets. So it has a double benefit.

Same as the scheduler. Here packet pipeline keeps going until it gets to waiting or dependency condition. What is that here? Need for data from memory can be considered that heavy weight hammer. So how do you avoid waiting and posting for data?

Let's demonstrate with analogy of the drinking dinosaur. Assuming that the dinosaur for water to get from its mouth to stomach, it will take 20 minutes. In this case, the 20 minutes can be referred to as stall. If it starts drinking water after it all ready feels thirsty, the dinosaur is going to be staying thirsty for at least 20 minutes. What's the best way to keep a dinosaur from getting thirsty at all? You have to plan ahead. The dinosaur should start drinking water 20 minutes before it starts getting thirsty. That way there is no stall. Now that's a scheduling point or the breaking of the pipeline stage.

Instead of releasing that read, which will introduce your stall because of waiting for data, pipeline issues a heads up to the memory controller. But just not a report and read, but just a brief hint.

Thus for example, when a packet being processed use packet meta data, say for packets 8 or 9 in the picture, the pipeline shows a brief hint which will have beeping. It knows with one of the working sets, say packet 2 and 3, for which it does all of the issued, similar prefix. Say hence forth, they will look up a bucket. Since it has already primed the pipe for packet 2 and 3, now that data had taken time to travel while processing for packets 8 and 9 were going on. Now after searching for a new working set, packets 2 and 3, it issues of real read, thus getting their data without waiting.

Thanks for watching. To learn more about the DTP packet frame work follow the links provided and do remember to like this video and subscribe.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Deep Dive Into the Architecture of a Pipeline Stage

Product and Performance Information