|
Setting measurable UXQ goals is particularly relevant in companies undergoing transitions toward
establishing user-centered processes. It increases emphasis and visibility of the key usage model being
developed and sets targets for high-level UX outcomes to facilitate user-centered processes and
accountability. UXQ goal-setting complements and may come well before detailed usability requirements and
use-case development. The purpose of UXQ goal-setting is to set the level of UXQ that the final product
should deliver with respect to perceptions, emotions, and thoughts, as well as attitudes that the product
should elicit from the target market segments. These convey to stakeholders in management and the
development process clear targets regarding how good the product must be.
We describe three broad steps in setting UXQ goals. Here is a brief outline. The first step in setting UXQ
goals involves identification and prioritization of the relevant UX dimensions. For the purposes of this
paper, it is assumed that market research and needs-finding processes (such as market segmentation,
ethnographic research, and usage model definition) have already defined the nature of the product
opportunity [14]. The first step in setting UXQ goals then is to rank order the high-level key features
and usages that have already been defined. Particular attention should be given to the features and usages
that are end-user noticeable, will be included in the marketing messaging, and differentiate the system
from others that will be on the market. In addition, any usages involving perceptual quality (such as
acoustics, video, or audio quality) can be called out as relevant according to the end-user value
propositions being targeted.
The second step is targeting specific, measurable UX dimensions for each of the key usages and features.
This involves assessing what emotions, attitudes, perceptions, and thoughts are being targeted for the
planned end-user value propositions. Selecting the proper dimensions to target and how to best measure
them is where a background in psychology and psychometrics is essential. The measures selected should be
based on branding/marketing strategies as well as practical and experimental design considerations.
The third step is working with the UX owners to assign specific cutoffs for each of the key features with
respect to the variables being measured. To do this, competitive analysis benchmarking data or prior
baseline data can be used. If no prior UX data are available, then judgment based on experience with
similar types of systems can be used to start with. The main objective is to set explicit goals for UXQ
well in advance of product development so that these goals can serve as clear targets and bring
appropriate attention to the UX throughout the development cycle. By highlighting what should be the UX
outcomes to development teams and the accountable stakeholders, strategies and resources can be channeled
to ensure user-centered design processes are prioritized appropriately with other business demands.
After goals have been set, measurements to assess the state of the UX can be planned for explicit
milestones in the program. At these milestones, decision makers can now better weigh tradeoffs that may
affect both the UX and other business outcomes.
Common questions that UXQ assessment can help answer include: How good is the UX for the target market?
What levels of perceptual qualities will consumers notice and value? How does the end-user value
proposition change when ecosystem partnerships or key functionality changes? How will the product be
perceived if key features are implemented differently, delayed, or eliminated altogether? How do we know
if a system is good enough to be released? These types of questions can be answered with UXQ studies.
UXQ Assessment Within Industry
Leading companies that consider UX to be a core part of their business have been using UXQ measures as
checkpoints and quality gates in their development processes. Based on a series of informal benchmarking
interviews, companies including IBM, Microsoft, British Telecom, Google, and Yahoo use some form of UXQ
assessment data as part of ongoing assessments and go/no go decisions regarding product releases. These
and other companies, including Philips, Ford Motor, and Proctor & Gamble have indicated that they
routinely assess the UX of products during the development process. The methods used for assessing UX in
these companies tend to be part of a larger user-centered innovation effort.
Integration into Organizational Processes
UXQ assessment can be geared to provide critical data about whether key aspects of the planned end-user
value are being achieved. Since the complete UX is not typically under the exclusive control of any one
project team (or even a single company), UXQ assessment provides a means to see, from the end-user
perspective, how the value propositions are manifesting themselves in realistic usage scenarios.
When aggregated across targeted user segments, UXQ data indicate the extent to which the quality goals are
being met. Results intersect at multiple points in the product lifecycle. Results can be quantifiable,
such as in a classic summary dashboard format, or focus on richer description and story-photo-based
reports. As such, study results can be tailored for executive reviews and become part of existing or new
feedback processes that help UX stakeholders make good decisions affecting the UX. This type of
information is particularly useful when tuning product requirements, refining marketing messages,
negotiating with the ecosystem co-travelers, addressing existing issues, and helping to drive innovation
for future systems.
The following sections describe three examples of UXQ studies illustrating some of the many distinct
methodologies used in these assessments. Each was applied in assessing aspects of a digital home platform.
The studies are real but the data shown are examples only and are not the actual data from the studies
conducted to protect the proprietary interests of the parties involved.
Example 1: UXQ Benchmark Dashboard Study
A competitive benchmarking study was run to assess how the UXQ of a platform, based on Intel® technology,
compared to similar usages on competitive platforms. This study was conducted during platform development
to assess the current state of UX and set goals for the next version of the platform. Since a number of
formative usability studies were conducted at earlier stages of planning and development, micro-level
usability issues were known and not the focus of this study. This UX study was designed to answer the
following research questions:
-
What delighted and frustrated consumers about the UX of the Intel platform?
-
How did the Intel platform compare to a prior version of the platform?
-
How did the Intel platform compare against non-Intel platform solutions?
-
To what extent did the Intel platform meet predefined UXQ goals?
-
How could product messaging be refined to reflect the actual UX?
There were 32 participants in this study carefully selected based on target market segments and additional
demographic selection criteria. Each participant received $200 in exchange for about four hours of their
time.
The study setting was a furnished apartment located in downtown Portland, Oregon. The apartment was
divided into two similar sections with each section containing different platform solutions. The platform
solutions each involved components that would be found in the den (PC, modem, wireless router) and living
room (digital media, adapter, and a TV). On one side, a platform based on Intel technology was placed in
boxes ready to be set up in the den and living room. On the other side was a similar configuration but
with competitive solutions.
Before arrival, participants were randomly assigned an order in which they would be exposed to the
platforms (Intel vs. competitor). Each condition started with participants learning about the technology
according to the platform messaging provided by marketing. Participants were not aware that Intel was
involved in the study until the debriefing at the end. As they learned about the platform value
propositions through the marketing messaging, positive and negative comments showing participants
reactions to the messaging were collected. Participants were encouraged to discuss their initial reactions
to the purpose and value of the technology.
Next, participants were asked to set up and use the technology as they would normally do in their home if
they had purchased the technology. High-level task guides were given to the participants that provided a
minimal level of structure. The activities contained in the task guide were based on prior research so
that it would reflect how people tend to go about the activities in actual home settings. The main goal
from the participants' perspective was to be able to get media content (picture, music, and videos) stored
on the PC to be viewed on the TV using new technology solutions. This involved unpacking the equipment,
setting up a wireless network, connecting a digital media adapter to the TV, and finally building a media
library on their PC. Exactly how participants did this and the order in which they did it was up to them.
Data collection involved structured interviews at natural breaking points in the set-up process. Rather
than focusing on documenting micro-level usability issues (most of these were known from prior usability
studies), the point of the UXQ assessment was to understand how the complete platform made people feel
during set-up and initial use. To this end, three main types of data were collected. First, rating scales
were developed to assess 1) level of satisfaction, 2) perceptions of being in control of the technology,
3) comparison with other similar experiences, and 4) value of the feature. Ratings consisted of Likert-type
scales and semantic differentials embedded within a semi structured interview by a trained
psychologist. Each of the rating scales was given at breaking points in the set-up process. The interviews
were used to bring out more details of how participants were responding during their experience and to set
an appropriate tone for participant introspection.
The second aspect of data collection was termed "user experience success." This was based on a
professional assessment during participant observation. A pre-defined set of criteria included task times,
number of user errors, and required assists. This and professional assessment was used to classify whether
the participant had a positive (successful UX) or negative (failed UX) interaction with the platform. The
UX success indicator is different than traditional usability-style success/failure rates in that, although
participants may be able to complete the task, the main variable of interest in a UXQ study is maintaining
a pleasurable experience during interaction with the platform. As a follow-up, exploratory semi-structured
interviews were conducted by the psychologist to get an understanding of additional attitudes and emotions
that were being experienced and which features participants were responding to.
The quantitative results were aggregated into a dashboard style UX summary. This summary highlighted
differences between goals that were set early in development and the measured UX quality of the platform
(see Figure 2). As this example illustrates, each of the key features and usages were assigned colors
(green, yellow, or red) based on the degree to which they met criteria for the targeted UX dimensions. The
particular UXQ dimensions assessed will vary depending upon the UXQ goals targeted (e.g., may include
specific targeted emotions or perceptions). In this example the focus was on UX success, attitudes
(composite of several attitudinal variables), comparisons to past experience, and the value the user
indicated for the key feature.

Figure 2: Sample UXQ dashboard against goals
click image for larger view
Additional comparisons were made between the Intel platform and competitive usages (see Figure 3 for an
example). Figure 3 shows how key features were compared with two competitive usages, C1 and C2. In order
to facilitate future goal setting, an overall bar or individual UXQ goals can be set to explicitly
consider comparative usages. In this example, the primary uses of these data was for immediate feedback to
product developers, for helping to shape marketing messaging, and for setting clear goals (including
targeting the right UX dimensions) for the next version of the platform.

Figure 3: Sample comparison benchmarking data
click image for larger view
Example 2: Streaming Video Quality Study
As previously mentioned, end-user perceptual experience is another vector of UX assessment. Perceptual
quality is a key aspect of UX that lends itself to setting targets and measuring against these targets.
Perceptual targets can be dynamic: the expectations of end-users change over time as technology delivers
different levels of quality, and comparison points alter in response to those changes. For example, as the
media community shifts from standard-definition to high-definition resolutions, the consumer anticipation
of high picture quality increases and this must be reflected in perceptual requirements. There are many
elements of human perception in the technology field; including areas of vision, hearing, and touch that
can influence an end-user's experience of a given product. For the purposes of this case study the area of
vision with respect to streaming video quality is discussed.
Streaming video is the process of sending video files from a server computer to a client computer so that
they can be viewed in real time. Traditionally, streaming video has not been designed for use by the
average consumer. However, recent advances in computer networking, combined with powerful computers and
video technologies, have introduced the use of streaming video into the digital home infrastructure.
Unfortunately, there are still limitations due to bandwidth constraints, which typically means that during
a streaming session, video files are forced to be transcoded (format conversion) or transrated (bit rate
conversion) to a more manageable size. The compression of the video leads to visual artifacts that are
commonly displayed as blocking or loss of detail. These transformations tend to result in degraded video
quality and overall degraded UX. Delivering excellent quality is a necessity in the highly competitive
consumer electronics and PC market.
The following perceptual quality case study was conducted to find the lowest operational transrating level
for streaming media associated with an acceptable UX. In addition, this work was intended to provide
validation engineers with a calibrated objective tool to estimate the UX in a testing environment quickly,
repeatedly, and reliably. In order to achieve these goals two questions were addressed.
-
What is the failure characteristic associated with reducing the bit rate used for MPEG-2 and the
corresponding subjective assessments?
-
At what point on the failure characteristic curve do the results have no more return on investment or
lowest operational point (what are the platform's video quality targets)?
The experimental design employed three measurement methods to determine the relationship between video
quality and UX scores: 1) an expert assessment, 2) a non-expert assessment, and 3) an objective tool
output. Video experts have prior knowledge of image compression, familiarity with video, and possess
extensive training, practice, and experience in evaluating different technologies. Contrary to experts,
non-expert participants are not directly concerned with video quality as part of their typical vocation
[15]. The final method used was the Video Quality Metric (VQM), a tool that produces a numerical result
that correlates to a set of non-experts' average perception known as a Mean Opinion Score (MOS) [16].
Since the VQM is a non-adaptive "black box" algorithm, the results need to be calibrated with every new
format or platform under test.
For these three methods the independent variable was bit rate, defined by the amount of data transferred
per second. In general, the higher the bit rate, the better the quality; DVD quality for a standard
definition video has an average bit rate of 7Mb/s with a peak of around 10Mb/s. The dependant variable was
the MOS value. The subjective assessments used a double stimulus impairment method in which participants
were instructed to score their perception of a processed video clip (i.e., reduced bit rate) when shown a
reference video clip (10Mb/s).
The study was designed using a standardized methodology verified by the International Telecommunications
Union (ITU). A critical part of the experimental design was to properly choose the visual stimuli to be
used for testing. For this experiment a standardized sequence of six clips was chosen that encompassed a
wide range of video content created to stress the transrating stack for different video conditions. Source
video was encoded using the transrating stack and written to the hard drive for playback. A controlled
playback system called Video Clarity was attached to a calibrated (color temperature, brightness, and
contrast) display [17]. Video Clarity was required for consistent evaluations, because it captures and
outputs exactly what it records [18].
All evaluations took place in a semi-anechoic chamber with 50% grey walls, lighting measured to 10 lux,
and participants were seated at a predetermined viewing distance to control viewing conditions.
All participants went through a vision acuity (Snellen Eye Chart) and color deficiency (Ishihara Testing
Plates) screening [19, 20]. Participants were instructed on the double stimulus impairment method and the
five point ordinal impairment rating scale that is typically used to determine failure characteristics
[21]. A practice session was also given to familiarize the participants with the testing set-up. Once the
evaluation began, the first five trials were discarded to address any learning issues and to stabilize the
viewer's opinion [22]. Each session had a unique randomized order of presentation, so that opinions were
balanced out.
The significant result is that we were able to differentiate between expert and non-expert when relevant,
enabling validation and comparison of subjective results to an objective measure. Thus, we were better
able to focus development efforts where they need to be focused, rather than over- or under-designing our
platforms and missing the return on investment point.
There are many touch points in a product lifecycle at which performing video quality evaluations can
benefit the design teams. One such place is during validation. At this phase, it is most efficient for the
engineers involved to have an objective tool to accurately evaluate the video performance during the
platform testing phase. This is crucial, since performing subjective assessments would be time consuming,
expensive, and probably too late to impact product design prior to release. Using the VQM, validation
engineers can get an estimated MOS value that is mapped back to end-users' perceptions.
However, as the subjective assessments have shown, this tool needs to be calibrated for every new video
test because it only provides measurements for typical artifacts. If the engineers were to use the tool
without calibration, the inflated results would indicate that transrating could go as low as 2Mb/s and be
above the true target threshold. This would have created an unacceptable experience for the end consumer.
As we've shown here, employing a holistic approach to assessments and drawing on the results of the expert
and non-expert assessment, we were able to calibrate the video quality metric tool so that it would work
for setting the appropriate performance threshold of the transrating stack. If we hadn't performed the two
assessments, we may have passed (with the tool) video that was, in fact, unacceptable. Psychological
assessment techniques thus play an important role throughout the development cycle.
This study not only determined the lowest operational level for a specific digital home platform, it also
set target levels and changed the validation process by adding correction factors to an objective tool.
Implications are clear for setting UXQ perceptual goals as part of larger UXQ assessment to drive end-user
noticeable value propositions of platforms. The eventual goal is to roll-up relevant human perceptual data
including areas of vision, hearing, and touch that can influence an end-user's experience of a product.

Figure 4: Non-expert versus expert ratings of visibility of impairment introduced with decreasing bit rate
click image for larger view
Example 3: In-home Contextual Study
The final case study, a series of contextual in-home data gathering exercises provides an example of an
exploratory and qualitative approach to UXQ assessment. Contextual studies provide useful data about how
the platform integrates into real home settings. In this example, the main goal was to understand how the
digital home platform and connected devices would be integrated into three home settings given the
richness and complexity of target market households. We set out to gather anecdotes, photos, and rich
information providing insight into the attitudes and emotional responses people have to the technology.
The main research questions were these:
-
What are initial expectations and questions people have when they plan to set up the technology in
their homes?
-
What are their attitudes and behaviors during set-up?
-
What are their reactions after the first several weeks of use, and how do they change from the initial
and set-up reactions?
Three households were selected for participation in the study. Selection criteria were used so that each
family would match at least one of the target market segments. Given only three families were used in this
study, the results were not meant to be predictive or representative of the target consumer population but
rather help identify integration issues with existing technology and other family members. Some
characteristics of the three families are described in Figure 5.
Each family was asked to set up and use Intel platforms complete with all parts of the technology that
made up the platform value propositions. Families were compensated approximately $300 for integrating the
technology into their homes and participating in the study activities. The platform solution provided to
participants allowed them to aggregate digital media (pictures, music, and videos) stored on PCs
throughout their home for viewing on their TV, using a remote control. The equipment included a PC,
monitor, input devices, a surround-sound speaker system, a digital media adapter, and a wireless router.
The procedure involved four main parts. The first was a home technology tour. Photos and notes were taken
regarding the type of digital entertainment technology the families had, aesthetics themes, and how media
were stored (pictures, music, videos) before introducing the new technology.
Second, in-person observations of the participants while they set up the equipment in their homes were
conducted. Key areas of interest included the locations they thought made sense and evidence of the mental
models participants had regarding how the technology could be integrated into their other PC and media-related
technology. Observations of interaction with the technology in context started with the full
"out-of-box-experience" assessment for each component of the equipment. The initial observations lasted about
four hours for each household. Data were collected on any trouble the participants experienced, especially
focusing on how good/bad the experience was for the participants. Semi-structured interviews, photographs,
voice recordings and follow-up probes were used to collect data to understand attitudes and emotions
associated with initial expectations and how these were resolved during this early phase of the usage
lifecycle.

Figure 5: Participant profiles
click image for larger view
UX data were collected across time as participants became more familiar with the platforms. This part of
the in-home data collection involved email questions and follow-up telephone calls. Participants were left
a series of activities to carry out at their own discretion, and they reported their success (or failure)
and reactions via e-mail and telephone calls. The activities prompted participants to use certain aspects
of the system and provide feedback. Participants were encouraged to explore all the key features of the
platform and provide feedback as if they owned the system.
The final part of the data collection involved follow-up home visits. These were conducted after families
had a chance to use the technology for several weeks. During these sessions, participants were asked to
demonstrate how they used the systems and asked questions about the impact of the technology on their
entertainment choices, daily routines, and social interactions. They discussed issues and what they saw as
opportunities that came up along the way. The follow-up interview lasted approximately two hours for each
household. As in the first home visits, semi-structured interviews, photographs, voice recordings and
follow-up probes were used to collect data to understand attitudes and emotions associated with use of the
platform.
Given these were case studies, the results of the home visits were presented to UX owners in the form of
anecdotes of experience, photos, and selected clips from the voice recordings. Based on these results, two
compelling potential drivers of attitudes to the brand were identified in addition to three potential
liabilities. These converged with evidence from prior studies. Several examples of known areas where the
user mental model did not match the conceptual design model were identified, and specific examples were
described. Finally, evidence of clear gaps in features was used to help prioritize feature requests.
The findings were triangulated with the results of the other user studies to better understand and help
"make real" to stakeholders some of the key UX issues with the technologies. Examples regarding key assets
and liabilities to the brand messaging were presented, and as a result, changes were made to the platform
messaging. Contradictions or gaps related to the users' mental models of the system were also addressed
through modification of the messaging. Combined with quantitative dashboard data, the in-depth
illustrations of problem areas helped to drive what would have been low-priority recommendations for
future basic product functionality.
In-depth illustrations of problem areas, combined with quantitative dashboard data, created new directions
for future basic product functionality out of what would otherwise have been low priority recommendations.
|