Level of Detail
Tracing all available events over time can generate billions of events even for a
moderate program runtime of a few minutes and a handful of CPUs. The sheer amount of
data is a challenge for any analysis tool that has to cope with this data. This is
even worse as in most cases the analysis tool cannot make use of the same system
resources as the parallel computer on which the trace was generated.
An aspect of this problem arises when generating graphical diagrams of the event
data. Obviously, it is next to impossible to graphically display all the data.
Firstly, it would take ages to do that. Secondly, it would depend on round-off
errors in the scaling and on the order of the data traversal which events would
actually make it to the screen without being erased by others. So it is clear that
only representatives of the actual events are shown.
A valid choice would be to paint only every 100th or 1000th event and to hope that
the resulting diagram gives a valid impression of the data. But this approach has
its problems, because the pattern selects the representatives can interfere with the
patterns in the underlying data.
Intel® Trace Analyzer uses a Level of Detail concept to solve this problem. The
Event Timeline Chart (as the other timelines) calculates a hint for the analysis
that describes a time span that can reasonably be painted and selected with the
mouse. This hint is called Resolution. The resolution requested by the timeline
takes into account the currently available screen space and the length of the
current time interval. Hence a higher screen resolution or a wider timeline results
in more data being displayed for the same time interval.
Intel® Trace Analyzer then tries to find a near match for the requested resolution.
The exact resolution depends on internals, which will not be discussed here.
Intel Trace Analyzer divides the requested time interval into slots of length
resolution. After that, representatives for the function events, the messages and
the collectives in these slots are chosen in a deterministic way. If a functions
spans more than the given resolution it results in a larger slot.
The representatives for function events are chosen as follows: for each slot and
each process (or thread group respectively) there is only a single function event
representing the function where the thread or group spent most of its time.
The representatives for messages are chosen as follows: for each tuple (sender,
receiver, sender slot, receiver slot) only one message is generated that carries
averaged attributes. These attributes are averaged over all messages matching the
tuple.
The representatives for collective operations are chosen as follows: for each tuple
(communicator, first slot) one collective operation is generated. So it can happen
that an operation of type
MPI_Gather
is merged with an operation of type MPI_Bcast
resulting in a merged operation with
no particular type at all (mixed).To prevent misconceptions, emphasis is given to the fact that the merging of events
only applies to the timelines and not to the profiles. The profiles always show
sums, minima, maxima or averages over the complete set of events. The calculation of
these results does obviously not depend on the screen resolution.