Technology and Research
Intel® Technology Journal Home
Volume 10, Issue 04
Autonomic Computing
Table of Contents
Technical Reviewers
About This Journal
Intel Published Articles
Read Past Journals
Subscribe
E-Mail this Journal to a Colleague
Home  ›  Technology and Research  ›  Intel® Technology Journal  ›  Autonomic Computing
ITJ Autonomic Computing
Intel® Technology Journal
Featuring Intel's recent
research and development
 
Autonomic Computing
Volume 10    Issue 04    Published November 9, 2006
ISSN 1535-864X    DOI: 10.1535/itj.1004.01

  Section 5 of 12  
Platform support of autonomic computing: an evolution of manageability architecture
Autonomics

Autonomic systems [5, 6] represent an approach to managing complexity by making individual system nodes and components self-managing. Such systems manage themselves by monitoring their operation, detecting any anomalies, and then adjusting accordingly to achieve normal operation.

Autonomic Computing (AC), also referred to by other terms such as Organic Computing and Self-Managed Systems, is inspired by the human autonomous nervous system, which handles the complexities and uncertainties of the human body without requiring our conscious efforts. AC emerged as a new strategic and holistic approach to the design of complex distributed computer systems, aiming at realizing computing systems and applications capable of managing themselves with minimum human intervention. In [5, 6] IBM researchers defined autonomic managers as being responsible for monitoring a managed element, and creating and executing plans based upon analysis of the data and knowledge they have acquired. Thus, autonomic systems are represented in closed-loop form consisting of four stages: monitor, analyze, plan, and execute, as shown in Figure 1.



Figure 1: A typical abstract autonomic framework
click image for larger view
 

The autonomic manager monitors the managed resources it controls and analyzes the data received. Based upon the data received and analyzed, the autonomic manager constructs and executes plans to achieve the management goals in accordance with the policies and rules in effect. It accumulates and uses knowledge (policies and rules) from observations, past experiences, and updates received from its peers and other components of the management hierarchy, such as management consoles. The manageability interface between the autonomic manager and the managed resources allows the autonomic manager to "sense" data from the managed resources and to "effect" the desired actions.

Self-management attributes

The following four underlying attributes were originally defined to constitute self-management.

  1. Self-configuring is a system's ability to change its configuration automatically in reaction to runtime changes or to assist in self-healing, self-optimization, and/or self-protection. For example, if a hard memory error is detected on a memory bank, a platform can isolate the particular memory bank allowing the system to continue functioning until the faulty part can be replaced.
  2. Self-healing is the ability of a platform to effectively recover when a fault occurs. Self-healing can be either reactive or proactive. A reactive self-healing platform attempts to correct or isolate a fault once it has occurred. A proactive self-healing platform attempts to predict whether a fault may occur and takes appropriate action to ensure the health of the system is maintained.
  3. Self-optimization is the ability of the system to optimize its operations based on a given operation profile. This involves monitoring its operation and optimizing accordingly, given a set of policies. It may also react to dynamic policy changes within the platform as indicated by a user.
  4. Self-protecting is the ability of a system to defend itself from accidental or malicious external attacks by being aware of potential threats and being able to handle those threats.

The above list of self-management attributes has been growing steadily and substantially covering features such as self-anticipating, self-adapting, self-critical, self-destructing, self-diagnosing, and self-recovering. We expect that the list will continue to grow.

For a system to meet the self-management objectives, it must be aware of its internal state (self-awareness) and current external environment (environment-awareness) [7]. Self-awareness and environment-awareness are achieved through the ability of a platform to collect raw internal (self-monitoring) and external data (environment monitoring) that is used for the following:

  • Data aggregation. This automatically transforms raw data gathered over time into information upon which predictions, actions, and strategies are based.
  • Data analysis. This is the analysis of raw and aggregated data that is used to aid in self-healing, self-protection, and self-optimization.

As a platform monitors its internal and external environment, changes may be detected that require the platform to adjust itself accordingly (self-adjusting).

The self-management features of an autonomic platform are dependent on one another. Using our memory example again, if memory fails in the platform, the platform will need to take corrective action (self-healing and self-configuration), optimizing itself in an attempt to continue to meet SLOs (self-optimization). To illustrate the dependencies between the self-management features, a more representative taxonomy of an autonomic system is shown in Figure 2. The figure shows the enabling technologies (e.g., virtualization and automation) that are required to build a self-managing system (as represented by the disc) with capabilities (represented as petals) such as self-optimizing, self-healing, and self-protection.

The figure also illustrates the environment and resources that an autonomic system needs to comprehend, such as the hardware, software, workload, and business requirements. Note that the petals protrude outside the self-managed disc to indicate that systems cannot internally self manage all possible scenarios: there will always be a need for outside intervention, since the self-managed object may be part of a larger self-managed construct. Petals also overlap to indicate that a change in one self-management function may impact one or more other self-management functions. For example, a self-healing action may need to be followed by a self-optimizing action for best performance, as discussed in the memory example.



Figure 2: Taxonomy of an autonomic system
click image for larger view
 

Autonomic computing environment

In an AC environment, there is often an inherent assumption that actions are purely based on predictive approaches. Although predictive technologies are an important evolutionary step in building better autonomic platforms, it is possible to have an autonomic platform that is reactive in nature. In other words, its autonomic behavior is driven by enforcing policies reacting to environmental monitoring data already collected. This approach to AC is the most common form found today.

Reactive autonomic platforms are, however, limited in their ability to achieve a high level of self-awareness and therefore may not fully achieve the desired AC operation. In other words, the given policies limit the effectiveness of the AC capabilities.

Proactive-based AC environments rely on machine learning techniques such as Artificial Intelligence (AI) to analyze collected data and predict operational anomalies before they occur. This approach clearly enables us to achieve the full AC vision. However, this does not come for free; predictive techniques require additional resources resulting in increased complexity.

An ideal AC environment would incorporate a combination of both proactive and reactive approaches. The ability for a platform to take proactive actions needs to be controlled by policies that describe not only what autonomic actions may or may not be taken, but also what the desired normal state of the platform should be.

When autonomic actions are based on local information and limited environmental information, the ability to make accurate decisions may be limited. This may result in a high number of perceived conditions that are incorrectly identified. If one looks at how, for example, enterprise management solutions work today, information collected by a platform is consolidated in a management console, where the management console can leverage the received data from multiple data sources and make a more informed and accurate decision on what has or may have happened on a particular platform.

Group interactions

When designing an autonomic platform, it is critical that it be able to interact with other platforms in the environment, thereby increasing its own knowledge by using the collective intelligence of other autonomic platforms. The additional information aids in the decision process allowing the platform to make decisions based upon local data and data collected from other platforms. Two examples illustrate the importance of distributed intelligence.

Data centers rely on Network Intrusion Detection Systems (NIDS) to analyze network traffic for patterns and hints in data to determine whether potential threats exist. NIDS is effective only after it examines a sufficient portion of the network traffic. A Host-based Intrusion Detection System (HIDS), on the other-hand, can only make decisions based on what it sees, thus relying on knowledge in the form of signatures and heuristic patterns that it has been provided a priori. By extending HIDS to leverage other HIDS on the network, the reliability of a HIDS increases. The first example is described in the paper "Towards Autonomic Enterprise Security: Self-Defending Platforms, Distributed Detection, and Adaptive Feedback" [8] in this issue of the Intel® Technology Journal. The authors describe the three building blocks for an end-to-end autonomic enterprise, namely detection, self-defense, and adaptive policy management. They explain how Intel technologies such as Intel® AMT and Intel-led standards initiatives are used to create a self-defending platform and how distributed intelligence can be used to defend the enterprise by corroborating the likelihood of infection.

The second example deals with self-optimization. Autonomic systems maintain operations when system conditions vary by monitoring their state and creating and executing plans based upon analysis of the data and on the knowledge they have acquired. A critical component of autonomic systems is their ability to handle uncertainty in the perceived state and the effect of their actions. The article "Machine Learning for Adaptive Power Management" [9], also in this issue of the Intel® Technology Journal, describes how Partially Observable Markov Decision Processes (POMDPs) are used for modeling autonomic systems and how a specific POMDP model is used for an adaptive power management system in a laptop.


  Section 5 of 12  

In this article
Abstract
Information Technology overview
IT Environment implications
Model
Autonomics
Platform autonomic requirements and architecture
Intel® Active Management Technology
IT Adoption of autonomics
Summary
Acknowledgments
References
Authors' biographies
Download a PDF of this article.    Email This Page
Back to Top