Intel® Trace Analyzer and Collector User and Reference Guide

ID 767272
Date 3/31/2023
Document Table of Contents

Clock Synchronization

By default, Intel® Trace Collector synchronizes the different clocks at the start and at the end of a program run by exchanging messages in a fashion similar to the Network Time Protocol (NTP): one process is treated as the master and its clock becomes the global clock of the whole application run. During clock synchronization, the master process receives a message from a child process and replies by sending its current time stamp. The child process then stores that time stamp together with its own local send and receive time stamps. One message is exchanged with each child, then the cycles starts again with the first child until SYNC-MAX-MESSAGES have been exchanged between master and each child or the total duration of the synchronization exceeds SYNC-MAX-DURATION.

Intel® Trace Collector can handle timers which are already synchronized among all process on a node (SYNCED-HOST) and then only does the message exchange between nodes. If the clock is even synchronized across the whole cluster (SYNCED-CLUSTER), then no synchronization is done by Intel® Trace Collector at all.

The gathered data of one message exchange session is used by the child processes to calculate the offset between its clock and the master clock: it is assumed that the duration of messages with equal size is equally fast in both directions, so that the average of local send and receive time coincides with the master time stamp in the middle of the message exchange. To reduce the noise, the 10% message pairs with the highest local round-trip time are ignored because those are the ones which most likely suffered from not running either process in time to react in a timely fashion or other external delays.

With clock synchronization at the start and the end, Intel® Trace Collector clock correction uses a linear transformation; that is a scaling local clock ticks and shifting them, which is calculated by linear regression of all available sample data. If the application also calls VT_timesync() during the run, then clock correction is done with a piece-wise interpolation: the data of each message exchange session is condensed into one pair of local and master time by averaging all data points, then a constrained spline is constructed which goes through all of the condensed points and has a contiguous first derivative at each of these joints.


int VT_timesync(void)


Gathers data needed for clock synchronization.

This is a collective call, so all processes which were started together must call this function or it will block.

This function does not work if processes were spawned dynamically.