 |  | Опции страницы Сделать закладку  |
 Содержание Проголосовать теги сообщества 
|
Introduction
By Jason A. Fletcher Application Engineer Intel Corporation
Compiler technology has advanced to the point where compilers can create assembly and machine language almost as well as humans can. Well, almost.... compilers still cannot assume away certain conditions that humans can, and often include instructions that can degrade performance. In the rest of this article I'll describe some steps that could be helpful in optimizing assembly code, particularly for running on systems built around Itanium® architecture.
Compilers are the first line of optimization, and can often be quite efficient at it, depending on the development environment. For example, a common default development environment may create 200+ lines of assembly for the following bubble sort code:
void bubblesort(int count, int array[]) {
for(int pass = 1; pass < count; pass++) {
for(int i = 0; i < count-1; i++) {
if(array[i] > array[i+1]) {
int hold = array[i];
array[i] = array[i+1];
array[i+1] = hold;
}
}
}
} |
A lot of the assembly generated for this code sample is in the form of indirect stores and loads, which are no-no's for obtaining optimal performance. However, the Intel® C++ Compiler generates about 60 lines of assembly for the above code. It also manages to not use a single indirect load or store, and schedules bundles fairly accurately.
Despite the success of the Intel C++ Compiler at minimizing the number of instructions generated, it could do better. Take the following bundles as an example:
{ .mmi
nop.m 0
nop.m 0
mov ar.pfs=r33 //0: 11 58
}
{ .mmi
setf.sig f7=r8 //0: 11 54
setf.sig f8=r32 //0: 11 53
nop.i 0 ;;
} |
Notice that there are two memory instructions, one integer instruction, three nops and six slots. Also note that there aren't any instruction dependencies in the two bundles. As a result, it is very reasonable to assume that the two bundles can be combined to speed up the program by one to four clockticks. While one to four clockticks is not a significant amount unto itself, when the machine runs at 1 billion clocks per second, it is an amount that can add up, especially if a function is called thousands or millions of times.
We invite you to post a comment (not monitored by customer support) on this page or send a question directly to our support team. |