New Intel processors introduce enhanced instruction set extensions to improve performance or strengthen security of an application. Instruction set extensions like Intel AVX1 and AVX21 are used to improve performance and Intel SHA2 instructions are used for SHA acceleration to increase security of an application.
What happens if developers want to create applications using these new instructions while their current hardware does not support these instructions? How does a company justify buying new systems supporting the new instructions while ensuring their applications can take advantage of these new instructions to improve performance?
The Intel® Software Development Emulator is used to execute applications containing new instructions on systems that don’t support them.
This article will discuss the benefit of using SDE to test code using new instructions.
What is the Intel Software Development Emulator (SDE)?
As its name implies, SDE is an emulator that allows code with new instruction sets to run on systems that don’t support these instructions. More information about SDE can be found at . It should be noted that the SDE is useful to assess functionality, but not performance, as it runs programs many times more slowly than native hardware. SDE can be downloaded here.
To test an application with new instruction sets using SDE, the application first needs to be compiled using appropriate compilers that support the new instruction sets. For example, to compile applications having AVX2 instructions, use the Intel compiler 14.0, gcc 4.7 or Microsoft* Visual Studio* or later versions of these compilers. SDE lists all instructions in assembly language. It not only lists instructions of user applications but also those in the libraries and kernel.
To display the full list of SDE options, use the following command at the command prompt:
Figure 1. List all SDE options
To display the longer list of SDE options type:
sde -help -long
How to Use SDE
This article will show how to use two popular SDE options. A video showing where to download and how to install and run SDE can be found at .
Not all but two most common SDE options will be mentioned here:
Figure 2. mix option
Figure 2 shows the mix option with the default output or user-defined output files.The “-mix” and “-omix” options will list all dynamic instructions that are executed along with instruction length, instruction category, and ISA extension grouping. Run SDE using this option by typing the following command at the command prompt:
sde.exe -mix -- <application name>
The result will be written to a file called mix.out. To specify a different name for the output file, use the following command:
sde.exe –omix <user-defined output file name> -- <application name>
Figure 3. ast option
Figure 3 shows the ast option with the default output or user-defined output files.Use the “-ast” or “-oast” options to detect whether there are transitions between SSE and AVX/AVX2 instructions. This option is very useful since these kinds of transition activities cost a lot of execution cycles. Reducing instruction transitions will improve the performance of an application. Run SDE using this option by typing the following command at the command prompt:
sde.exe -ast -- <application name>
The result will be written to a file called avx-sse-transition.out. To specify a different name for the output file, use the following command:
sde.exe -oast <user-defined output file name> -- <application name>
Note: The options can also be combined into one single command as follows:
sde.exe -mix -ast -- <application name>
sde.exe –omix < output mix file name> -oast < output ast file name> -- <application name>
A closer look at the output of the ‘mix’ Option
The output file resulting from using the mix option contains a great deal of information which will not all be explained in this article. Let’s focus on certain portions of the file.
Figure 4. This portion of the output file shows that the application (linkpack_xeon64) has 28 threads currently running.
Figure 5. The output shows a portion of thread 0 (TID 0) in which instructions are used along with instruction types (AVX, FMA and SSE).
Figure 6. At the end of the thread there is a summary of which instructions were used in the threads along with how frequently they were called.
Figure 7. At the end of the output file, there is a summary of all instructions in the applications and libraries that are used by the application.
A closer look at the output of the ‘ast’ Option
The output from running SDE using the ast option helps identify transitions from SSE to AVX and vice versa.
Figure 8. SSE-AVX and AVX-SSE transitions do not exist
Figure 8 shows the case when there is no SSE-AVX and AVX-SSE transitions. Transitions between SSE-AVX and vice versa consume a lot of valuable execution cycles. Sometimes these transitions can reach up to 20,000 cycles. It is important to reduce these transitions. In case7 there are transitions, the transition output file will look like in figure 9.
Figure 9. SSE-AVX and AVX-SSE transitions exist
There is a good article about how to avoid these transaction penalties at .
Benefits of using SDE
SDE only counts instructions that are executed (dynamic) while the application is running, not instructions existing in the code (static). This is a very good way to debug the application in case there is a problem assuming that certain instructions are expected to be executed within certain portions of the code. If no expected instructions were detected in a specific block of addresses corresponding to certain portions of the code, then something must have happened like an unexpected/unaware branching condition. SDE provides some options like start-address and stop-address to handle this situation. Note that these options are not documented5. Hopefully, these options will be documented in future SDE releases.
One important thing should be noted here is that running an application with different options or input will also trigger different behavior and dynamic execution, so one cannot assume that a single run of an application will tell them the whole story. It is always a good idea to run an application under different workloads to observe the application behavior.
Potential Performance Improvement
SDE can detect SSE-AVX and AVX-SSE transitions. By reducing these transitions, performance can be improved.
Checking for Bad Pointers and Data Misalignment
SDE has the ability to check for bad pointers and data misalignment. Below is a snapshot of the options for these two features from SDE documentation8.
Figure 10. Options for debugging
SDE is used to test applications that utilize new Intel instruction sets in the absence of hardware supporting these instructions; thus it helps assess whether an application might benefit from a new or future platform released by Intel. It should be noted that SDE runs considerably more slowly than a native platform, and is not intended to provide insight on future performance. SDE dynamically counts instructions that are executed and not instructions in the code, as well as SSE-AVX and AVX-SSE transitions. These features can be used for debugging and optimization purposes. SDE can also aid in debugging in the form of detecting bad pointers and data misalignment. These are only a few of the features offered by this emulation environment: we invite you to explore more by looking at the documentation included in the Reference section of this article.
Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. Intel, the Intel logo, Intel Core, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. Copyright © 2015 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.