|
We begin this section by discussing key issues with today's management models. We then describe how manageability architectures need to
evolve to support autonomic behavior such as monitoring changes in the environment, making autonomous decisions at the system and group
level, and providing policy-driven services such as dynamic provisioning and load balancing. Finally we present our approach to
achieving the self-management vision, via platform autonomics.
Manageability solutions today
Most commercial management solutions operate by placing software agents in the host execution environment to monitor the health of the
OS and applications and to optionally control their operational states. Such management agents implement sensors and effectors for the
OS, applications, and in some instances, the underlying platform hardware. Agents are designed to communicate with a remote console
(that usually resides at a central management location) to provide information and control interfaces to human operators. In the basic
but typical case, local instrumentation data gathered by agents are sent to a database to be analyzed and, when necessary, used to
trigger a control decision based on appropriate automation policies. Autonomics may be added to this basic structure in the form of
local policy and knowledge engines. These reside in the local management agents and close the control loop in response to changes of the
observed local state. Local action eliminates the latency and overhead of a round-trip communication with the remote console and
possibly the operator. Other benefits of autonomics include a reduced volume of management data traversing the system (since many
stimuli are processed locally near the origin) and increased scalability, due to a reduced load on the management console and its
database.
A primary weakness of implementing manageability and autonomics in the same execution environment with the applications and OS that they
monitor is that malfunctions of the monitored environment may impair the agents' operation forcing the agents' lifecycle to be limited
by the lifecycle of the environment. Put simply, when the OS crashes it takes the agent down with it. As a result, no management is
possible when the OS is not running.
Intel® Platform Autonomics Approach
Our platform autonomics approach rectifies the aforementioned problem by providing a separate execution environment for the autonomic
manager. This results in an independent lifecycle for the autonomic manager, which is expected to function in both pre-OS and post-OS
states. In the pre-OS state, it manages configuration and provisioning actions. An obvious benefit in the post-OS state is the ability
to perform forensic analysis by examining the machine state exactly as it was left by the crash. When coupled with event logging within
the manager, this can be a powerful tool in determining the root cause of failures.
With proper design, a platform autonomic manager can be decoupled from the power states of the host processor, so that it is powered
even when the host processor is off. This is very useful for performing host power on/off operations, performing hardware setup and
configuration, and automating provisioning that facilitates platform self-configuring behavior and attributes. In the next section, we
describe our first step towards achieving our platform autonomics vision.
As of this writing, most autonomic systems and prototypes reported in the literature seem to be implemented in higher software layers,
mostly in user space with perhaps some OS modifications. Our research focus is on platform support for autonomics [10]. Specifically, we
are looking at dedicating platform resources and firmware to implement a set of management and autonomic behaviors that are exposed via
well-defined interfaces. Our long-term vision is to create platforms with on-board support and intelligence that make them discoverable,
configurable, self-managing, self-healing, and self-protecting even when the host OS is not active. Future platforms may provide the
agile infrastructure to support the evolution of dynamic, autonomic, distributed computing as described in [11].
To deliver the autonomic vision, systems and applications need to be built out of autonomized hardware and software components.
Obviously, the granularity of a component will be an on-going research topic. Regardless, an autonomized component will need to do the
following:
-
Characterize itself including introspection, discovery, and self-description in a machine-readable way.
-
Dynamically monitor its ambient and surrounding environment.
-
React intelligently, at least locally, to changes in the environment.
-
Interface with other components for communication, particularly with components responsible for managing the whole system.
Core autonomic platform manager requirements
Overall, for a platform to be effectively managed autonomically, a number of core components are required. We discuss each of these
below.
Standard out-of-band external interfaces
One way for platforms to collect environmental information from their surroundings is to communicate with other platforms through Out-of-Band (OOB) interfaces. OOB interfaces are commonly independent of the host operating environment, which could be an OS or a Virtual
Machine Monitor (VMM) with one or more OSs. In the article "Standards for Autonomic Computing" [12], in this issue of the Intel®
Technology Journal, a more detailed overview is given of the importance of standardization in autonomic computing and the benefits of
using Web services as external interfaces.
Internal interfaces to host operating environments
The interface to the host operating environment is needed to get visibility into core metrics from the host OS that may not be available
on internal buses. In addition, some components of an autonomic solution may be deployed in the host operating environment to gather
additional environmental information not visible otherwise.
Standard internal platform interfaces
These interfaces are needed to dynamically discover and interact with sensors and effectors. Autonomic decisions are based on multiple
sensors within the platform. Since the components vary from one platform configuration to another, the ability for the autonomic manager
to discover and monitor these sensors dynamically is critical to an autonomic design.
Platform container
A platform autonomic container is required to implement the autonomic functions. It may be provided in a variety of ways such as a
dedicated microcontroller in the chipset or a plug-in option card. In any of these cases, the container is expected to have dedicated
physical or virtual execution resources, such as processor and memory, supporting a software execution environment that is isolated and
possibly different from the OS and user application execution environment of the host platform. Isolation from the host operating
environment and separation of manageability and autonomic functions into a dedicated execution environment provides some fundamental
advantages, resulting primarily in increased availability. While a separate container provides a number of benefits, it is only as
useful as the autonomic functions implemented in it. These must be made available externally for use in various phases of the host
system lifecyclestarting with pre-power, pre-OS states, assisting the OS when it is present, and taking over when it is not.
Inter-platform container interfaces
These interfaces are used by autonomic managers to exchange information as in the aforementioned distributed malware detection and power
management examples. The inter-platform container interfaces may be standardized, but in general are highly optimized trustworthy
interfaces.
Given that these core components need to be embodied in platforms, Intel has taken the first steps to implementing its autonomic vision
with Intel® AMT.
|