- Home›
- Technology and Research›
- Intel Technology Journal›
- Technology with the Environment in Mind
Technology with the Environment in Mind
Making USB a More Energy-Efficient Interconnect
ARCHITECTURE
Addressing the power issues associated with USB is a challenging one. Figure 6 depicts the high-level vision for a truly energy-efficient model for USB. The general idea is to keep the entire path from main memory through the host controller and down to the device completely quiescent until meaningful data needs to be transferred, thereby transforming today's continuously polled architecture to one where devices are only polled when needed.

Figure 6: Energy-efficient USB vision
click image for larger view
But in order to maintain compatibility with mainstream OSs it was important to avoid changes to the upper levels of the USB software stack. This focused the scope of our solution on the lower levels (miniport driver and hardware), as illustrated in Figure 7.

Figure 7: Design constraints
click image for larger view
Making USB Power Friendly
In the next few sections, we discuss various energy-efficiency optimizations based upon the following criteria:
- If no devices are connected or no work is scheduled then USB hardware should remain in a low-power state.
- Suppress host-side activity (upstream of the host controller, e.g., to main memory) when there is no meaningful work to do.
- Suppress device-side activity (downstream of the host controller, on the USB bus) when there are no data to send to/receive from devices.
Miniport Drivers
Because of the polled architecture, the host controller's interaction with devices is very important, and if it is not done properly it can adversely affect platform power. In the architecture overview we talked about host controller schedules and how these are used to poll devices and to perform data transfers. Proper management of these schedules is absolutely necessary for producing a power-friendly USB subsystem.
For example, suppose no devices are attached to the system. Obviously a power-friendly USB software stack should schedule no work when there are no devices, and it should immediately remove all associated work from the schedules when a device is removed. If this sort of basic "schedule" and "controller" management is not performed well, any additional power-efficient enhancements will have limited impact. Thus, it is critically important to ensure the miniport drivers do effective work scheduling, turn controllers off when not used, and remove all associated descriptors from host controller schedules when devices are unplugged, disabled, or the work has completed.
Several critical changes were identified in Windows XP* SP2 that have resulted in tremendous power savings. Intel worked with Microsoft engineers to develop these changes and make them available for both Windows XP and Windows Vista*. This includes support for the UHCI run/stop bit, EHCI run/stop, and asynchronous/periodic schedule enable bits, as well as aggressive schedule idle detection. These software optimizations have in turn enabled other hardware optimizations, which we discuss in the next section.
Host Controllers
New features for Intel's mobile USB host controllers were identified to allow for power management opportunities when one or more schedules are enabled (endpoints present and active). The enhancements include the following key concepts.
Caching
The Caching technique allows the host controller to store schedule information (descriptors) in controller-local memory in order to significantly reduce accesses to main memory, particularly in the case where devices are relatively idle. These data are typically stored in an abbreviated format where just enough information is provided to generate a transfer request (poll). Figure 8 illustrates this technique.

Figure 8: Caching technique
click image for larger view
If the device NAKs the transaction, the host controller remains completely idle (since the information needed to generate the transaction was stored locally). If the device ACKs the transaction, the host controller typically must open the path to memory to move the actual data (to/from the device).
The caching feature is especially helpful for endpoints that observe a high NAK rate (for example, streaming or networking devices with bulk asynchronous endpoints open all of the time).
Deferring and Link Power Management
Although caching is fairly good at quiescing host-side activity it does nothing to address downstream (device-side) activity. The USB 2.0 Suspend state was originally intended for this purpose, but it is very difficult to use because of entry and exit latencies and other limitations. It takes considerable time to enter and exit this state (3ms + OS overhead for entry, 30ms + OS overhead for exit), and devices are severely limited in the amount of power they can consume while residing in this state. Additionally, Suspend is coupled with the device D3 state where the OS assumes hardware context is lost (and thus the device needs to be re-initialized and context restored upon exit), which adds significant latency and often interrupts device functionality.
The L1 state is a new Link Power Management (LPM) state that addresses the key deficiencies of the existing Suspend state (herein referred to as L2) by reducing state latencies and decoupling the link state from the device state (allowing the device to remain in D0). The L1 state is intended to be used dynamically when the device is operational (D0), but otherwise idle, and able to quickly enter and exit this low power state without disrupting normal operation. Host controllers can safely negotiate L1 entry with idle devices, progressively decreasing downstream (device-side) activity until all devices reside in L1 (or L2), at which point no downstream activity will occur until either the host or device wakes the link to an active (L0) state.
The L1 transitions have significantly lower entry and exit latencies (10s of µs) than those of L2 (10s of ms). As with L2, both device- and host-initiated wake events are supported from the L1 state, noting that L1 device-initiated wake events play a prominent role in another key technique known as Deferring.
Supporting the L1 state requires modifications to both USB host controllers and devices. The L1 state is a new feature that augments USB 2.0 power management; it does not replace the existing L2 (suspend/resume) mechanism. The proposed L1 definition is backward compatible in that a new host can determine whether a device supports L1. A new device will continue to work properly with legacy hosts (obviously without L1 transitions), and old devices will continue to work on new host controllers. The only time L1 will be used is when a device acknowledges support for this feature on a new host controller.
The policy for using the L1 state is platform and implementation specific and will likely depend on the type of endpoint being served by the host controller. For periodic (interrupt or isochronous) transactions, the host controller would likely implement a policy whereby the device is immediately placed into the L1 state as shown in Figure 9.

Figure 9: Example L1 policy for periodic devices
click image for larger view
For asynchronous (bulk or control) transactions, the host controller would likely implement a policy whereby the device is polled some number of microframes or frames at the nominal asynchronous poll rate before attempting to transition the device to L1, as shown in Figure 10. This is done in order to reduce the overhead for devices that stall for short periods between subsequent data phases.

Figure 10: Example L1 policy Asynch devices
click image for larger view
The L1 state benefits all types of devices and traffic patterns, and when coupled with the associated host controller enhancements, it can aggressively save power across the entire platform by allowing the entire USB subsystem to enter and remain in a low-power state until some meaningful event occurs.
Devices
When we analyzed the behavior and power impact of many USB 2.0 peripherals currently in the market it became evident that a clear set of device recommendations was required to promote energy-efficient designs [3], both for present-day systems and forward looking to future optimizations. We summarize these recommendations in this next section.
Periodic-Triggered Asynchronous Transfers
In general, it has been observed that there is a multitude of devices that generate traffic in a continuous stream using bulk (asynch) endpoints, with a high NAK rate (>90%). While the design is simplistic, it has a key downfall: bandwidth, and hence device buffering/throughput is highly variable and hard to quantify. A principal recommendation is to use an interrupt (periodic) endpoint to indicate that a device requires service and to use bulk endpoints dynamically for moving data to or from the device. This concept is termed "periodic-triggered asynch" and is illustrated in Figure 11.

Figure 11: Periodic-triggered asynch transfers
click image for larger view
By using this scheme, the response time is well defined (namely, the polling interval requested), and streaming bandwidth is more carefully managed for data movement. This is also a more platform-friendly approach in that it preserves bus bandwidth (a shared resource for USB) for use by other devices. The key virtue of course is that this scheme is much more power friendly as illustrated by the idle time between USB poll events.
Minimize Polling Rate for Periodic Endpoints
Using the aforementioned periodic-triggered asynch scheme or when periodic interrupt/isochronous endpoints are used for other purposes, it is important to maximize device buffering such that the poll rate of the device can be as slow as possible. A power-friendly device should employ poll rates of at least 1ms, and preferably 2-4ms or longer. It may be also possible to support endpoints with different periodic rates that are used selectively based on the bandwidth needs of the device. In such a case, if the device has a high-speed connection, device buffering may mandate a 1ms poll rate interval: when the device has a slower connection, the device may have sufficient buffering to tolerate a much longer (e.g., 4-8ms or higher) poll rate.
Use Isochronous Transfers for Streaming Devices
One common characteristic observed for streaming devices is the use of bulk endpoints for data transfers. There are several problems with this approach. First, asynchronous bandwidth is shared across all ports on a given controller, and thus, realized bandwidth may vary dramatically depending on whether other devices are actively consuming bus bandwidth. This can be readily observed with two devices that use asynchronous transfers for streaming content: in many cases the streams become unstable whenever both devices are active on the same host controller at the same time. This is because bandwidth is shared across a single host controller instance, highlighting the fact that USB is fundamentally a broadcast bus where multiple streams are time-sliced rather than served concurrently.
On the contrary, the isochronous traffic class is time scheduled, and bandwidth is properly allocated by host software. As such, a device can receive a dedicated amount of bandwidth to service its endpoint where this traffic effectively runs at a higher priority level than asynchronous transfers. Moreover, since isochronous transfers reside on the periodic schedule, the effectiveness of power management techniques are generally better (versus the asynchronous schedule)—at least when the periodicity of these transfers approaches 1-2ms or more.
Use LPM L2 Dynamically (Selective Suspend)
Devices should support and use Suspend (L2) whenever the device is idle and use of this state is possible, occasionally waking to look for activity, incoming connections, or other device state changes. This is important as a device should not continuously post periodic (and certainly not asynchronous) transfers when it is not active or actively connected. For example, in the case where a USB network device is scanning for network connectivity, it should take care to do this very infrequently or provide hardware capabilities in the device to do this without requiring continuous transfers from its function driver. For other classes of devices, inactivity can be easily determined by whether the device is in use or not (for streaming devices such as audio/video, occasional use devices such as fingerprint sensors and GPS). The most difficult class of device to make use of Suspend is typically human interface devices (HID) such as mice and keyboards, where the end-user may perceive the increased latency associated with L2 entry/exit (e.g., choppy mouse movement) when using these devices.
Use LPM L1 Dynamically
As discussed previously, the long-term path to fully addressing the power efficiency limitation of USB 2.0 requires that the device and platform implement a new low-power link state known as LPM L1. For device implementations, it is important to note that entry into the L1 state should not result in any loss of functionality, as it is intended to be used while the system and device may be idle between bursts of activity. It is also important that the device pay attention to the Host Initiated Response Duration (HIRD) field in the host command sent to the device to request entry into the L1 state. This parameter is indicative of the depth of lower power state the platform is expecting to enter. If the platform is semi-active, the field may indicate a light response duration (e.g., <200us), whereas if the platform and devices are more deeply idle, the field may indicate a bigger number (~1ms). The device should use this parameter to control the depth of power management in use by the device to save power, for example, by shutting off PLLs only when a "long" (~1ms) L1 entry transaction is identified.
Design True Composite Devices
The use of integrated hubs within multifunction devices has been a common practice to streamline and simplify hardware implementations. Although convenient, this approach has a number of power management pitfalls and is therefore strongly discouraged. For example, many Deferring scenarios are not feasible for devices that are attached to a downstream hub rather than directly to one of the host controller's root ports.
The most energy-efficient designs involve true composite devices. Here multiple logical functions (devices) reside behind a single USB 2.0 physical device interface where each independent function is exposed as sets of one or more endpoints.
Application/Driver Synchronization
Many devices such as streaming (media playback, cameras) or occasional use (fingerprint sensor, GPS) are bundled with application software. It is critical that when the application stream is shut down, care must be taken in the device function driver to ensure that the application properly cleans up driver requests on exit or inactivity (pause, mute, etc.) to avoid dangling transactions pending on the device; otherwise, these transactions remain un-serviced or are continually retried.
Avoid Polling Integrated Buttons
Many devices such as integrated cameras support a so-called "Instant On Feature," whereby the device has local buttons that are typically serviced by a periodic interrupt endpoint. The buttons require a continuously running periodic interrupt endpoint to poll the button, and this wastes power. It is recommended that devices purposefully designed for mobile platforms do not support buttons (better to enable through applications or traditional keyboard hotkeys), or if they do support buttons that must be functional, you should work with the platform designer to provide platform-level notifications mechanisms through sideband signals and Advanced Configuration and Power Interface (ACPI) BIOS modifications. By using such a scheme, the notifications may be delivered on demand, and the function driver can be the target of these notifications providing the same net effect for Instant On Features without having to continuously run the periodic schedule.
If the button can't be avoided, then architect a very long poll interval for the button (10s to 100s of milliseconds) to reduce the inevitable platform power impact. Such a long polling interval will give other hardware optimizations a chance to kick-in (Caching, Deferring, L1, etc.).
Challenges
Clearly there were and are numerous challenges associated with making USB 2.0 an energy-efficient interconnect. We are quite pleased with the progress thus far, but note the biggest remaining challenge is the broad and timely adoption of these devices, OS, and platform features by the ecosystem.
