The Intel® C++ Compiler for QNX Neutrino* RTOS makes it easy to get outstanding performance from embedded 32-bit Intel® architecture (IA-32) processors; offers source, binary and build-environment compatibility; and comes with first-class customer support.
Performance: Advanced Optimization Features- Optimized floating-point instruction throughput
On IA-32, the C++ Compiler for QNX Neutrino RTOS uses the stack for efficient execution of floating-point (FP) instructions. Improve application performance on embedded IA-32 processors due to overlapping instructions that can put their calculation results in any stack register. - Interprocedural optimization (IPO)
IPO can dramatically improve application performance in programs that contain many small- or medium-sized functions that are frequently used, especially for programs that contain calls within loops. - Profile-guided optimization (PGO)
The PGO compilation process enables the Intel C++ Compiler for QNX Neutrino RTOS to take better advantage of the processor microarchitecture, more effectively use instruction paging and cache memory, plus make better branch predictions. It improves application performance by reorganizing code layout to reduce instruction-cache thrashing, shrinking code size, and reducing branch mispredictions.
- Data prefetching: Data prefetching is an effective technique to hide memory access latency. Data prefetching inserts prefetch instructions for selected data references at specific points in the program, so referenced data items are moved as close to the processor as possible (put in cache memory) before the data items are actually used. For applications that are compute-intensive, this can yield significant performance improvements. In addition:
- Data prefetching is automatic
- Data prefetching coordinates with other optimizations (for example, software pipelining)
- By using compiler-generated prefetching, the code remains portable, and the developer does not need to manage this aspect of application performance in source code to write processor-specific instructions.
- Full support for Streaming SIMD Extensions 3 (SSE3): Obtain support for architectural features of Intel processors. The IA-32 processor features support the Streaming SIMD Extensions 3 that distinguish the Intel NetBurst® microarchitecture introduced with the Intel® Pentium® 4 processor. Streaming SIMD Extensions 3 go beyond the initial, performance-oriented support for multimedia or graphical components of applications and include improved performance for floating point and double-precision computational needs. The new instructions are supported in a number of ways including inline Advanced Server Management (ASM), compiler intrinsics, class libraries, the vectorizer, and the Intel® Performance Libraries.
- Automatic vectorizer: Automatically parallelize code to maximize underlying processor capabilities. Vectorizer examples demonstrate how to increase the speed of application execution. New features include support for advanced, dynamic data alignment strategies, including loop peeling to generate aligned loads and loop unrolling to match the prefetch of a full cache line.
- Runtime support for Intel processor generations – Processor Dispatch: Optionally build applications for a specific generation of Intel processor using processor dispatch. Dispatch now allows for multiple specific targets. Perform application development for the latest Intel processor - the Pentium 4 processor - while simultaneously maintaining the ability for the executable to run on previous IA-32 processors.
For a detailed description, please refer to the following paper:
Optimizing Applications with the Intel® C++ and Fortran Compilers for Windows* and Linux*.