|
As presented in the previous section, a secure platform is the best way to secure the enterprise, but it may take a long time for
enterprises to upgrade all end systems. Therefore it is still important to have a reliable detection system throughout the enterprise.
In this section, we describe a framework to combat the increasingly urgent problem of intrusion attempts within an enterprise.
Traditional defenses have relied on perimeter mechanisms such as firewalls to protect the inside of an enterprise from external threats.
However, the modern enterprise has very loosely defined boundaries and hosts are generally free to move in and out. Once infected hosts
return into the enterprise, the infecting malware is free to spread relatively unchecked. Conventional detection schemes are based on
observing traffic entering and leaving aggregation points around the enterprise. These schemes, while moderately successful, have
several limitations, the most severe of which is that they are not very good at detecting slowly spreading worms that try to blend in
with normal background traffic. The counter-measure that is often used for this type of worm is to use a variety of sensors around the
network that measure and track different traffic features and that correlate one piece of information with another piece of information.
Our framework extends this idea to the logical extreme: we consider each end-host in the enterprise to be a potential sensor (or Local
Detector) and we allow the end-hosts to exchange information and corroborate the state of the network, i.e., whether it is infected or
not. As we will show, such a system is able to detect slowly spreading network anomalies at a very low FP rate (which is much lower than
those associated with conventional methods and tools). Briefly, there are several intuitive ideas for a collaborative, host-based
framework:
-
IDSs deployed selectively might not see any worm traffic for a long time and perhaps see it only when it is too late. Collaboration
is seen as a way to remedy this; systems that allow multiple IDSs to share information have been shown to provide greater "coverage" in
detection [9, 10, 11, and 12].
-
Analysis of network traffic at the host level allows the weak signal to be compared to a much smaller background noise-level, so the
signal-to-noise ratio can be boosted by orders of magnitude compared to an IDS that operates within the network.
-
Host-based detectors can make use of a richer set of data, possibly using application data from the host as input into the local
classifier.
The detection and inference framework we describe here is quite simple: end-hosts contain Local Detectors (LDs) that are meant to detect
anomalous behavior at the end-host. This could be by means of system integrity checks, watching outgoing network traffic, looking for
anomalous behavior, etc. Periodically, the LDs gossip their local state (whether an anomaly was detected in some preceding window, or
not) to other hosts. Some (or perhaps all) of the nodes in the network also contain Global Detectors (GDs). Their function is to
aggregate the signals received from LDs (each GD receives signals from some number of LDs in the network). Thus, each GD computes the
probability that a network-wide anomaly is occurring.
In the rest of this section, we describe the LDs that we use and describe how information from different LDs is combined into a single
measure. Subsequently, we describe simulation results that compare the performance of different models.
Local detectors
Simply put, an LD (end-host-based detector) is any entity that generates an output signal, taking as input the state of the end-host.
For our specific purpose, the LD simply generates a Boolean signal that is true if an "anomaly" is detected, and false otherwise. We
also assume that the LDs are weak in the sense that they may have a high FP rate, and are non-specific, so are likely to fire for a
broad range of anomalous behavior. The LD implementation that we use in our proof of concept is quite simple: it counts the number of
new "network connections" that are initiated by the host in a certain time window. If this count exceeds a pre-defined threshold, it
assumes an anomaly exists. In Figure 3, we show the distribution of the number of new connections initiated by a host in a 50s interval.
The plot corresponds to network traces collected at 37 hosts in the Intel corporate network over a five-week period. Also shown in
Figure 2 are the average rates for a number of known worms (MS Blaster, Slapper, Code Red II, etc.). Clearly, these propagate at a rate
that is quite high and clearly stand out in the distribution. Thus, using a very high threshold for our LD (about 200 connections per
50s) would be sufficient to detect the said worms accurately and without many FPs. However, the trend that has been recently observed is
that worms are getting slower and slower, moving to the left side of the distribution. In order to demonstrate the effectiveness of our
system to detect slow worms, and at a low FP rate, we push the LD threshold down by orders of magnitude and set it to 4 connections per
50s interval. Note in Figure 2, this number is well within the bulk of the normal traffic distribution. If an individual LD was used by
itself, and at this threshold, it would have generated thousands of false alarms over the five-week period; however, it would have
successfully detected worms operating at a much slower rate. Thus, a simple way to create a weak, non-specific LD is to drastically
reduce the threshold of some standard heuristic, although other standard anomaly detection techniques can be used as well.
The alarms raised by individual LDs in our system are "gossiped" to other peer hosts. Thus, LDs periodically share their "belief" with
other nodes in the network. Conceptually, we could think of end-hosts containing an LD (which operates as above) and also a GD which
simply aggregates the information received from different LDs, based on some model to compute a network-wide belief. In the following
section, we briefly discuss a few models that do this.
A valid concern is how might such a system protect the computing environment against the malicious corruption of the LDs or the GDs. One
approach would be to adopt integrity services on the subset of nodes running the distributed detection algorithmsthe number of nodes
required is substantially fewer than the entire enterprise. An alternate approach, besides the adoption of integrity services, is to
place the LD and GD functionality in protected hardware, rather than in more vulnerable software. Work is underway to study other
vulnerabilities, e.g., how to make the system robust to some number of rogue LDs sending misinformation to GDs and thereby skewing the
analysis completely.

Figure 3: The distribution of the number of initiated connections per 50s interval. Propagation rates of previous worms are indicated as
W1=MS Blaster, W2=Slapper, W3=Code Red II, W4=Slammer, W5=Witty. The horizontal line denotes the threshold used in the local detector (4
CPI).
click image for larger view
Global detectors
There are potentially several models that allow combining of the local beliefs, received from individual LDs, into some "global" belief.
The simplest possible way would be to count the number of positive firings and threshold this value (that is, use the PosCount model).
Another potential model is the CuSum detector, well known in the area of statistical process control, which is used to detect deviations
from some mean (over a statistic of interest). However, a drawback with these simpler models is that they do not really support
heterogeneous LDs, i.e., detectors of varying "quality." In contrast to these baseline techniques, models based on Dynamic Bayesian
Networks (DBN) [13] overcome this shortcoming by taking into account the FP and True Positive (TP) rates of individual detectors in a
systematic manner. Essentially, DBNs are a principled formalism for expressing independence relations while modeling temporal
(stochastic) processes. In our work, we explore two DBN instances, namely the Change-Point DBN and the Epidemic DBN. The former assumes
that up to some time, tcp, the network as a whole is not in an anomalous state; whereas after tcp the network is. In contrast, the
latter models the spread of exponentially growing signals (anomalies, in our specific context) in a system. Clearly, each of these
models is well suited to specific applications. For instance, if a system were to use high quality LDs (very low FP rate and very high
TP rate), then presumably, the PosCount model would perform quite well (and has the advantage of low computational overhead). In the
next section, we compare the performance of our system assuming very general (but weak) LDs.

Figure 4: Performance comparison of various Global Detector models with a stand-alone Local Detector (No Corroboration). The x-axis
sweeps through the false positive rate (numbers of false alarms per week) and the y-axis plots the fraction of the network infected at
the time of a global detection. Notice that the DBN models have the lowest FP rates and also facilitate early detection, whereas the
stand-alone Local Detector either achieves low FP rates or a low fraction infected, but never both simultaneously.
click image for larger view
|