Technology and Research
Intel® Technology Journal Home
Volume 10, Issue 01
Converged Communications
Table of Contents
Technical Reviewers
About This Journal
Intel Published Articles
Read Past Journals
Subscribe
E-Mail this Journal to a Colleague
Main Visual Description
Intel Technology Journal - Featuring Intel's Recent Research and Development
Converged Communications
Volume 10    Issue 01    Published February 15, 2006
ISSN 1535-864X    DOI: 10.1535/itj.1001.06

  Section 9 of 15  
Using Intel® Technologies to Build Next-Generation Media Servers
INTEL DEVELOPMENT ENVIRONMENT

Intel VTune Performance Analyzer

Intel VTune Performance Analyzer is a powerful tool for analyzing the performance of an application. VTune analyzer has two main modes, sampling and call graph mode [20].

Sampling mode is a non-intrusive way to profile the entire system. In sampling mode, statistics are rolled up to the application level. The user can then delve down to the function level. The user can double-click on a function and get instruction-level statistics as annotations in the source files [20].



Figure 15: VTune Performance Analyzer sampling mode output [20]
click image for larger view
 

Call graph mode is more invasive and slows the program under test. It does, however, enable a graphic view of calling sequences of the different procedures within the program. It further identifies critical paths in the program. Statistics are provided for each procedure, including "wait time," the amount of time the function spends waiting for an event to occur.



Figure 16: VTune Performance Analyzer call graph mode output [20]
click image for larger view
 

Beyond CPU utilization and wait time, VTune Performance Analyzer collects important statistics such as cache misses, clock ticks per instruction, and thread utilization.

Under Windows, another tool, VTune Performance Analyzer Tuning Assistant, makes recommendations on improving the performance of certain functions.

Use of VTune Performance Analyzer can be thought of as a three-step process: set up, find an area to improve, and improve. Finding an area to improve, given VTune analyzer's intuitive user interface, usually takes no more than an hour.

VTune analyzer's output is function-centric. In some cases, it is nice to abstract function-level results into higher-level module results. A parsing utility can easily be used to aggregate function-level statistics into module-level statistics.

VTune Performance Analyzer should not be used as a primary tool for the measurement of performance. VTune analyzer's role is in improving performance. However, VTune analyzer can be useful for understanding the performance characteristics of one release over another. For instance, suppose we upgrade a software release from A to B. If, as a matter of course, we measure A and B with VTune analyzer, we can compare their performance characteristics. Are we seeing the same relative performance of our code? Has one module significantly decreased in performance? Does this make sense? While use of this A-B sanity check is not strictly necessary, it can help rapidly identify probable design or coding inefficiencies.

VTune Performance Analyzer is an important tool for getting maximum performance out of the Intel Architecture. Intel NetStructure® Host Media Processing Software, in particular, benefits from regular inspection with VTune analyzer.

Intel C++ Compiler

The Intel C++ compiler is one of the software development tools available to accelerate software performance on Intel platforms. It is available for a number of different operating systems, including Windows and Linux. The Intel C++ Compiler 9.0 for Linux provides outstanding application performance for software running on Intel processors [25]. It includes advanced optimization features such as full support of multi-core processors with capabilities including Auto- Parallelization, Optimized floating point instruction throughput, Interprocedural Optimization (IPO), Profile-Guided Optimization (PGO), and Data prefetching. It includes a Compiler Code-Coverage tool and a Compiler Test- Prioritization tool. Optimizations specific to the IA-32 architecture are provided, such as full support for Streaming SIMD Extensions 3 (SSE3), Automatic vectorizer, and Processor Dispatch, as well as support for Intel Extended Memory 64 Technology. More details are available in "Intel C++ Compiler for Linux" [25]. The compiler also includes an enhanced debugger that allows debugging of optimized code, as well as support for stack frame runtime error checking to help reduce buffer overrun security exploits. Various case studies are presented in "Intel Software Tools Case Studies" [27], highlighting performance improvements due to Intel software tools. Table 9 highlights the performance improvement provided by Intel Compiler and IPP in one particular case study–H.263 Video Encoding [26].

Table 9: H.263 Encode Performance Improvement due to Intel Compiler and IPP [26]
 
ImageCom PC Encoder Configuration Intel Pentium 4 processor-based system
Intel IPP Intel Compiler Encode Time for 80-sec clip Percentage Improvement
    123 sec 0%
Y   84 sec 32%
  Y 69 sec 44%
Y Y 57 sec 54%


  Section 9 of 15  

In This Article
Abstract
Introduction
Taxonomy of a Media Service Network
Circuit-Switched Network
Packet-Switched Network
Application Programming Interfaces
Intel NetStructure® Host Media Processing Software
Intel Architecture for Signal Processing Applications
Intel Development Environment
Where We Go From Here
Conclusion
Performance Testing
Acknowledgments
References
Authors' Biographies
Download a PDF of this article.   
Email This Page
Back to Top