Technology & Research
Home ›  Technology & Research ›  Research ›  Exploratory Research ›  Research Projects › 
Human Activity Recognition

 
Top Pages
 

 
Resources
 

 
Who we are
Meet Andrew Chien
Meet some of the researchers that drive our Essential Computing.
 

 
Information
 



Print this page

Overview
The goal of the Human Activity Recognition project is to build a system that can automatically infer a wide range of everyday human activities (such as cooking pasta, taking a pill, or washing dishes) and provide proactive assistance, if needed, to complete an activity. One broad objective of the project is to enable the elderly to continue living in their homes for as long as possible. The Human Activity Recognition project is a collaboration between Intel Research Seattle and the University of Washington.

Machine learning is a key aspect of the Human Activity Recognition project. For a system to automatically infer what activity is being performed, it must have a detailed model of the activity. Specifically, the system must have the following information: a text label for the activity; the number of steps involved; the objects used in each step, and the probability of using them; and the estimated time it takes to complete each step.

While the activity is being carried out, data is gathered from sensors affixed to every object used. The data is fed into a reasoning engine-a machine learning algorithm that analyzes the data, compares it to a large set of activity models, and infers which model is the best match.

The Three Components of Machine Learning Systems

Machine learning systems vary, but all have three components:
Sensors that gather data about the physical world -In the case of Human Activity Recognition, RFID tags gather data about which objects are being used to perform an activity, and additional sensors are used to capture other data, such as motion, temperature or visible light.
Models-Beliefs or prior knowledge about real-world processes (human activities, in the case of Human Activity Recognition).
Reasoning engine - The machine learning algorithm which analyzes sensor data, compares it to large set of models, infers which model is the closest match for the data, improves the models based on observed data and recommends appropriate actions.


Two members of the Human Activity Recognition research team are focusing on the machine learning aspects of the project. Matthai Philipose is leading an effort to mine the Web to find models of human activities. Tanzeem Choudhury is focusing on how machine learning systems can handle the uncertainty and wide range of variations in how human activities are performed.


Mining the Web for Models of Human Activity
There are tens of thousands of activities that a person might perform in the course of everyday living. The traditional way to build an activity model is to gather sensor data for an activity and from that data, learn a "model" that maps the activity to the data it may generate. To learn what it means to "make pasta," for example, a researcher could instrument a kitchen with sensors, have a person wearing an RFID reader perform the task 50 or 100 times, and build a model from the data gathered in the process. However, applying this technique to build tens of thousands of activity models is untenable because, in the absence of a reasonably good prior model to start with, learning techniques typically require very large quantities of training data.
Intel researchers, led by Matthai Philipose, are exploring a more efficient approach. Reasoning that written instructions about how to perform common daily activities already exist in cyberspace, they are mining models of human activity from the Web. Sites such as www.ehow.com provide step-by-step instructions on how to perform thousands of everyday activities. These "how to" descriptions can serve as a starting point for building statistical models by providing an example "structure" of models for each activity, including the name of the activity, the number of steps in the activity, and the expected order of those steps.

However, step-by-step descriptions do not supply two key additional pieces of information, or model parameters, necessary to complete the activity model. First, they do not provide the probability that a given object will be used for a particular activity. Second, they do not directly provide the expected duration of the activity as a whole, or of its individual steps. Researchers leverage the statistical properties of the Web to acquire this information.

To compute the object-use probabilities, researchers use a simple technique commonly used by Internet search engines to find similarity between phrases. Suppose they wish to estimate the probability of cleaning a bathtub using rubber gloves. The first step would be to perform an online search for the terms "clean a bathtub" and "rubber gloves." The number of search results that include both terms, compared to the number that includes only the term "clean a bathtub" gives a rough indication of the probability of using gloves when cleaning bathtubs. To compute the expected duration for the entire activity, researchers exploit the fact that "how to" instructions often, but not always, included expected durations. To acquire the duration for a particular activity, they automatically identify multiple "how to" instructions on the Web for the same activity and aggregate the maximum durations found in each, to create a robust estimation of the duration. The goal of this exercise is not to extract precise activity models from the Web but to provide initial input for the reasoning engine. Using the rough structure and parameters mined above as an educated guess, machine learning techniques can use sensed data to customize the activity model to produce a very accurate end result with much less data than would be necessary without having the ability to make an educated guess.


Gathering Sensor Data for the Reasoning Engine
To gather physical data, researchers place RFID (radio frequency identification) tags-battery-free wireless stickers that transmit an object's identification-on objects associated with a particular activity. The person engaged in an activity wears an RFID reader, in the form of a glove or bracelet, so the system can determine when the person is touching a given object. As the person performs an activity, sensor data from each object used is streamed to the reasoning engine, which analyzes the data and matches the objects against a database of activity models to infer what activity is likely being performed.

The addition of an accelerometer, or motion detector, can improve the ability of the reasoning engine to differentiate among activities. For example, if you know through RFID sensors that a person is in the bathroom, the set of potential activities underway is narrowed. If the person is wearing an accelerometer that indicates rapid back and forth hand movement, he could be cleaning the shower or brushing his teeth. If the RFID tagged object he is holding is a toothbrush, the set of potential activities is narrowed further. Combining object and motion data thus makes it easier for the reasoning engine to hone in on the correct activity.

Researchers are exploring the use of wearable accelerometers, perhaps in the form of a wristwatch. Alternatively, long-range sensing devices called wisps-RFID tags augmented with sensors such as MEMS-based accelerometers-can be placed in the home to track when an RFID-tagged object is moved. Wisps are being developed by the Wireless Identification and Sensing Project, led by Intel researcher Joshua Smith).

Tanzeem Choudhury is exploring other potential sensors to increase the accuracy of activity reasoning further. She is experimenting with a sensor board equipped with seven different sensors: an accelerometer, a digital compass, and sensors for audio, high-frequency light, visible light, barometric pressure, and temperature. . The board was developed primarily by Gaetano Borriello and one of his students, Jonathan Lester. Choudhury is using signals emitted by the board to detect various activities with a high degree of accuracy. The goal is to determine the combination of sensors that generates the highest level of accuracy in discriminating among activities.


Customizing Activity Models
Handling Variations and Interruptions
There is no single way of performing an activity. Different people might carry out the same activity in a variety of ways, and the same person might vary the steps in the process, or the objects used, from one day to the next. In addition, activities may be interrupted, and a person might engage in multiple activities simultaneously. Machine learning systems must be flexible enough to deal with these variations and still be capable of correctly inferring activities.

Intel researchers, led by Tanzeem Choudhury, are tackling the challenge. They are attempting to cluster activities according to location or time of day, learn how a person transitions back and forth between activities, and incorporate this data into activity models. This information can be gleaned from observation or personal interviews and incorporated into the model. Sensor data gathered while the activity is performed over time can be used to improve the model further, customizing it to fit the individual.

Understanding exactly how an individual performs a task (including routine variations in performance), is especially important when dealing with persons experiencing cognitive decline. By customizing activity models, the system can track significant changes in the performance of activities that could signal further decline.

An additional requirement of the models is the ability to reason about situations in which objects used in performing an activity are not the same as those that appear in the activity model. For instance, a person might use a scouring pad rather than a scrub brush to clean the bathtub, even though the model calls for a scrub brush. By incorporating information about object relationships and probabilistically weighting these relationships, machine learning algorithms can reason about the novel object that is used. This approach captures the commonsense notion that a scrub brush is functionally equivalent to a scouring pad for cleaning bathtubs.

Mining activity models can significantly reduce the need for human supervision in the learning phase. The system would need to prompt the user to provide a label only when the system does not find a match among the models (none of the models are have high enough likelihood of occurring) it already knows about. If the user provides a new label, the system could begin to build a probabilistic description for the new activity.


The Reasoning Engine: Tracking, Rating and Prompting
The role of the reasoning engine goes beyond matching sensor data to the appropriate activity model, or customizing the model for a particular individual. The reasoning engine can perform at least three functions, in order of increasing difficulty. First, it can simply track a person's activities. For example, it could monitor an elderly person in her home and convey the results to remote relatives, friends, or anyone else involved in coordinating care. Intel researchers have already developed a prototype application called the CareNet Display. This interactive, digital picture frame augments a photograph of an elder with information about her daily life, such as what pills she has taken, what food she has eaten, and whether or not she has exercised.

A more difficult function of the reasoning engine is rating how well a person performed an activity. A display that informs caregivers of instances where an elder performs poorly may be more useful than one that simply lists the person's activities. Matthai Philipose is exploring an approach to rating performances of activities by calculating how well the performance matches the (customized) activity model. An important challenge is explaining why a person received a poor rating. A possible solution being explored is to point out the smallest set of changes to the performance that would have resulted in a good rating-for example, "Grandma had a bad day because she forgot to eat breakfast."

The most difficult function of the reasoning engine is prompting or actuation-taking action based on a rating. The action could be as simple as prompting the person to turn on the stove to boil water for pasta or as complicated as turning off the stove automatically. Actuation could also involve alerting a caregiver that the elder is deviating significantly from her daily routine.

Balancing the Costs and Benefits of Prompting or Actuation
The benefits of prompting or actuation must be carefully weighed against the potential costs to the person using the system. The impact of an incorrect action (say, turning on a stove) could be significant, so the decision must be carefully considered. Furthermore, if prompts occur too frequently or do not make sense, users will soon learn to ignore them, which could prove costly as well.

The Human Activity Recognition team has enlisted the help of Intel's Statistical Computing project team to develop an actuation system based on a well-known machine learning formalism that incorporates costs and benefits to determine the optimal time to prompt.

An important input to the cost/benefit equation is the context in which prompting occurs-especially the people who are present at the time. An elderly parent may feel comfortable being prompted when he is alone, but may be embarrassed if prompted in the presence of his son or daughter.

Tanzeem Choudhury is exploring this challenge in a related research project that is attempting to understand an individual's network of relationships. One approach being investigated is the collection of audio data from conversations in which a person is engaged. The goal is not to record conversations (which would intrude on privacy) but to capture distinctive audio characteristics of conversations, such as how long the conversation lasts, the energy levels in the speech of each party involved, and whether one party tends to dominate the conversation.

By learning about a person's social network and relationships and roles, it would be possible to determine, for instance, whether her social network is declining and she needs more support. Such data could also be used to ensure that the system only prompts in certain situations and not in others that the user may find embarrassing or intrusive.


Other Potential Applications
The main focus of Human Activity Recognition research is on helping the aging and those with cognitive impairment to perform daily activities, but there are many other potential applications of the research. For example, Intel is collaborating with the University of Washington medical school to explore the potential for using Human Activity Recognition for training in anesthesiology. The goal is to have the system monitor students and provide assistance in performing tasks properly, without the need for constant supervision.

Another potential use for Human Activity Recognition is in capturing "best known methods" (BKMs) for performing a variety of activities in a range of business settings, from manufacturing facilities to fast food restaurants, and applying that learning to train new hires and other employees.


Back to Top