- Home›
- Technology and Research›
- Intel Technology Journal›
- Technology with the Environment in Mind
Technology with the Environment in Mind
Making USB a More Energy-Efficient Interconnect
INTRODUCTION
To comprehend USB's power problems you first need to have a basic understanding of how it works. We won't try and make you an expert on USB architecture; rather, we will just provide enough detail so you can understand the fundamental problems and how the proposed fixes address these.
The root of most of the power issues is the fact that the USB is based on an architecture that constantly polls devices. Although this creates a simple and low-cost device model, it is fundamentally inefficient—especially when the device is idle or has little data to transfer. Specifically, a USB device is incapable of transferring data or generating an interrupt without being polled by the host. The best it can do is indicate the rate at which it wants to be polled in the event that activity occurs. This rate is typically assigned statically when the device is first configured and tuned for highly active phases (e.g., to maximize throughput).
We will go into a little more detail about how a USB device is designed to work in this polled environment and then discuss why polling creates power problems.
Figure 1 illustrates the behavior of normal (non-polled) data transfers for PCI devices. In this bus model, devices are generally implemented as fully capable bus masters. When a PCI device needs to transfer data it simply requests control of the bus and initiates one or more cycles to main memory (green line #1), which also results in a snoop cycle to the CPU (green line #2) to ensure data consistency in case the memory contents reside in the CPU cache.
Contrast this to the USB model where the device must wait until the next time it is polled by the host to transfer data, or more importantly, the host must continually poll a device just to see if it has data to transfer. The USB provides two general models for data transfers: synchronous and asynchronous. Synchronous transfers are polled at a guaranteed periodic rate with a maximum frequency of once every microframe (125 microseconds). This corresponds to the Isochronous and Interrupt endpoint types. Conversely, asynchronous transfers are not polled at a guaranteed rate, but for most implementations this occurs quite frequently (many times per microframe) to achieve high data throughput when needed. Bulk and Control endpoints belong to this transfer type.

Figure 1: PCI data transfer (non-polled)
click image for larger view
In addition, many USB host controllers rely on main memory for their schedule information. Data structures within the USB schedules inform the host controller of the (active) synchronous and asynchronous endpoints that need to be serviced, the polling frequency for synchronous endpoints, memory locations for data transfers, etc. The host controller must access these structures frequently, both to understand when endpoints need to be serviced (polled) and to initiate each transfer request—regardless of whether data are actually transferred.
Figure 2 illustrates the behavior for a typical USB Bulk IN transfer (read from device, write to main memory). The host controller first reads the transfer descriptor information from its schedule in main memory (red line #1), which in turn causes a snoop cycle to the CPU (red line #2) to maintain cache coherency. Once read, the host controller initiates the transfer to poll the targeted device (red line #3). If the device has no data to transfer it returns a NAK response (tan line #1). Otherwise, an ACK is returned along with whatever data the device needs to transfer (tan line #1), which the host controller then writes to main memory (tan line #2), and again causes a snoop cycle to the CPU (tan line #3).
USB transfers are inherently less efficient than equivalent PCI transfers, requiring a total of six cycles (three being snoops) versus two cycles (one snoop) on PCI. But the bigger issue is that USB endpoints that have no data to move (constantly NAK) continue to be polled by the host resulting in a fairly active USB subsystem that generates frequent memory accesses, snoop cycles, and USB transfers. This behavior does not occur on PCI or other non-polled interconnects. Thus, USB works quite hard at doing nothing, which translates into poor energy efficiency.

Figure 2: USB data transfers (polled)
click image for larger view
It is also important to notice the majority of power increases occurs upstream of the USB host controller. For example, a host controller polling a single Bulk IN endpoint can generate bursts of activity every 8-16 microseconds (us), which prevents most of the core logic (CPU, memory, backbone busses, clocking, etc.) from entering a low power state. This in turn can have a huge impact on platform idle power and drastically decrease battery life.
Background on USB 2.0
The Universal Serial Bus 2.0 specification [2] is defined by the Universal Serial Bus Implementer's Forum, Inc. (www.usb.org). It supersedes and is backwards-compatible with the USB 1.1 specification. USB 2.0 encompasses three distinct data rates: low-speed at 1.5 Mbps, full-speed at 12 Mbps, and high-speed at 480 Mbps. USB 2.0 uses a 4-pin bus with two differential signaling lines (D+/D-). Fundamentally, the USB 2.0 bus is a polled bus in that data and control transactions are initiated by the host, not the device. Because polling directly translates to increased power consumption across the platform, device design techniques are especially important. The USB 2.0 bus standard has a low power state known as Suspend, but today the latencies associated with entry and exit make it problematic to use as a dynamic flow control and link power management mechanism.
The USB 2.0 specification defines four distinct traffic classes (control, bulk, periodic, isochronous) and three data rates (low, full, high). This is typically managed on Intel® Architecture (IA) platforms using two different host controller types: Enhanced Host Controller Interface (EHCI) for high-speed devices and Universal Host Controller Interface (UHCI) for low- and full-speed devices.

Figure 3: Host controller schedules
click image for larger view
Figure 3 illustrates the various schedules, traffic classes, and data patterns for low and full-speed transactions associated with low-, full-, and high-speed devices. For low- and full-speed devices serviced by the UHCI controller, the host controller maintains a frame list pointer that references a physical address in main memory. The host controller parses this schedule every frame (1ms interval) to fetch memory structures (descriptors) that tell the host controller how to poll devices. The operating system (OS) software is responsible for populating the schedule. This specifies which transactions the host controller will attempt during each frame. In the Windows* OS, periodic transfers are layered first starting with isochronous endpoints that are allocated a fixed bandwidth. After this, the OS places interrupt endpoints that are generally polled at some derived periodicity, typically using a binary tree (poll rates of 1ms, 2ms, 4ms, 8ms, 16ms, 32ms, etc.). Bulk and control endpoints are added next and typically arranged as a linked list. The host controller typically parses the periodic elements once per frame, spending the rest of its time (until the next frame) processing bulk and control endpoints.

Table 1: General UHCI/EHCI power implications
click image for larger view
The EHCI controller services USB 2.0 high-speed devices and contains two distinct schedules. The asynchronous schedule consists of bulk and control endpoints that are typically arranged as a linked list. The periodic schedule contains isochronous and interrupt endpoints that are linked at a specific periodicity. The EHCI controller is capable of processing periodic transactions at an accelerated rate referred to as a microframe (125us)—eight times more frequently than UHCI. Thus, periodic transfers may be scheduled at a maximum rate of once every microframe (125us).
Table 1 summarizes the platform power implications when servicing low-, full-, and high-speed endpoints using traditional UHCI and EHCI host controller designs.
Effect of USB Activity on System Power
When bus master traffic is generated by a USB host controller on an otherwise idle system, the platform will immediately transition out of a low power state to process this traffic. This flow is represented in Figure 4 which loosely depicts an Intel® Core™2 Duo mobile processor based system.

Figure 4: System power impact of USB activity
click image for larger view
Because this activity is a platform-wide event, the resulting power impact can be large. Figure 5 illustrates a bus master transfer from a WLAN device fielding a keep-alive packet from an 802.11g access point. Although the actual transfer is short-lived, the component and platform power scales up dramatically to process this activity.

Figure 5: Platform power impact for WLAN activity
click image for larger view
Thus, the general solution for addressing USB's power issues requires that we significantly reduce the amount of activity the host controller generates, especially when USB devices are otherwise idle (no data to transfer).
