|
The VTune™ Performance Analyzer provides several call graph ratios in the call graph display, specifically:
-
% in function
-
Average Self time per call
-
Average Total time per call
These are additional metrics that may be useful in some situations and are discussed below.
% in function
is really a quick way to get the info normally displayed on the call list tab without switching to the call list tab. That is, the call list tab will show which functions contribute to the total time of the function. % in function tells you whether or not the called functions contributed significantly to the total time of the function. If the number is low, you know that the called functions were the major contributors to this function. A high % in function indicates that the function's own code contributed more to the total time than the code of called functions and you might then look at the ratio Average Self time per call.
A high Average Self time per call (along with a significant number of Calls) could indicate a good candidate for optimization, since any improvement in performance would be magnified by the number of times the function is called.
Also, a high Average Self time per call would indicate that optimizing the body of the selected function would result in better performance than optimizing any of the called functions. Of course, that is assuming that the selected function is a major contributor to the overall application's execution time. If the called function is called by another function that contributes more than the selected function, it may make sense to optimize the called function. You have to examine the algorithm and make that determination.
Sorting by Average total time per call and finding functions that have both a high number of calls and a low Average self time per call (assuming the Total time for the functions are still significant with respect to the total time of the application), will identify functions that might benefit from being managed in multiple threads. Look at the context of the callees to see if introducing threads make sense. That is, consider the following: - Little or no use of global variables within the call tree – if global variables are used in parallel functions, safeguards would need to be introduced.
- Parameters that are passed by value, not reference – this means there is no immediate dependency between parallel calls to the function(s).
- Parameters that are pointers to structures or classes - ensure that each call to the function is with a distinct pointer so that there is no contention for the same contents, unless it is accessed as read-only.
- Note: Introducing thread-safe libraries may cause a slowdown.
Another possible optimization within these functions is to find the significant contributors to the function (the highest percentage functions in the call list Called table) and, if they are called many times, determine if these functions might be small enough to inline, thereby reducing the overhead of calling them.
This applies to:
|