# Analyzing Intel® SDE's TSX-related log data for capacity aborts

Published: 01/19/2015

Last Updated: 01/19/2015

Starting with version 7.12.0, Intel® SDE has Intel® TSX-related instruction and memory access logging features which can be useful for debugging Intel® TSX's capacity aborts. With the log data from the Intel SDE you can diagnose cache set population to determine if there is non-uniform cache set usage causing capacity overflows. A refined log data may be used to further diagnose the source of the aborts. Since the log file may be huge to navigate and diagnose without refining, here, a simple Python script is presented to help analyze Intel TSX-related log files to help root-cause sources of capacity aborts.

TxSDELogAnalyzer.py is a simple program which focuses on capacity-aborted transactions. It uses the SDE's log data which can be collected using the set of parameters as below.

>$sde -tsx -hle_enabled 1 -rtm_mode full -tsx_debug_log 3 -tsx_log_inst 1 -tsx_log_file 1 We will describe a few features this simple script has and why it might be important to use them to aid in debugging transaction aborts due to capacity overflows. In a typical scenario, you may do the following to find the source of those aborts. (i) Look at the TSX read/write set sizes and cache set usage distribution (ii) Take a random sample of aborted transactions and closely examine them (ii) In conjunction with case (ii) you may need to look at various log data related to capacity aborted transactions, as well as committed transactions to compare how they differ, hoping that you may spot a difference that could lead to the source of the aborts. ### Cache set usage distribution You might need to see how the cache sets are populated in an aborted transaction to see if there are outliers or a non-uniformity which could be causing premature overflows. Then you can locate the data structure in this transaction and enhance it such that it uniformly uses the available cache sets -- thus avoiding capacity overflows (see CIA allocator as a possible mitigation). Under normal circumstances you may need to compare cache access distributions between committed and aborted transactions to see how they differ by plotting a histogram or any suitable graph with both cases. TxSDELogAnalyzer.py calculates the average cache population among all transactions of the same type from the input log file. It outputs the final result in CSV format, making it easy to plot a graph in an Excel sheet for example. For average distribution of cache lines access the following command structure is used. >$ TxSDELogAnalyzer.py -D <type_num> [-o <output_file>] <input_log_file>

Where type_num specifies the type of transactions data to generate the distribution from. type_num can either be 2 for capacity-aborted transactions or 8 for committed transactions. This command calculates unique cache set accesses for each transaction data of type type_num and finally calculates the average of these cache access distributions.

A sample output from this command from capacity aborted transactions log data is given below.

Average cache set population for capacity aborts

Cache set #, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63
Avg. population, 5.00, 1.00, 0.00, 0.00, 1.00, 0.00, 1.00, 2.00, 3.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 2.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 2.00, 2.00, 0.00, 2.00, 1.00, 0.00, 0.00, 2.00, 1.00, 0.00, 1.00, 2.00, 0.00, 1.00, 3.00, 0.00, 2.00, 2.00, 2.00, 2.00, 3.00, 2.00, 3.00, 2.00, 4.00, 1.00, 1.00, 0.00, 1.00, 1.00, 3.00

Fig.1: A graphical representation of the sample output. It shows how many cache lines are used in each of 64 cache sets of a data cache.

To easily interpret the graph in Fig.1 we need to understand the Intel® SDE's "-tsx_cache_set_size " parameter whose default value is 8 but 4 was used to generate the sample output above.

Intel's L1 data cache is 8-way associative that means every set of the data cache has 8 ways. The two logical cores share the L1 cache dynamically (Intel(r) Hyper-Threading technology is enabled). To approximate this behavior we can statically limit the number of ways per thread to 4 using Intel SDE (but this assumes that the usage of ways is equal).

If  "-tsx_cache_set_size " is set to the default value which is 8, it means hyper-threading is not "emulated" as all the ways in a cache set are used by the (only) single hardware thread in a core. To emulate Hyper-Threading, at least at data cache level, we set  "-tsx_cache_set_size " to 4. Thus only a half of the ways in each data cache set are used up. With the Hyper-Threading emulation a maximum of 4 ways per cache set should be uniquely modified in TSX-transaction execution. Otherwise a capacity overflow occurs aborting the transaction. From Figure 1, we see that at cache set #0 there is an overflow (5 unique way accesses instead of max 4) and this could lead to a capacity abort.

### Transactions data filtering

Intel® SDE already has knobs to start/stop logging thus limiting the log output size. This is normally done by adding the following command options to the SDE command you use to run your test

-control start:interactive:bcast,stop:interactive:bcast -interactive_file tmp

However, you may sometimes need log data with longer execution coverage; hence a huge log file will be produced. Limiting the log output size may therefore not be enough and you many need filtering options.

Moreover, your log data may contain data for various abort reasons and the commits, too. Even the log information of thread operations outside a transaction executions may be included in the SDE log. Therefore, it may be hard to analyze a huge log file with a mixture of information. Moreover, you may be interested in only logs of certain types -- e.g; capacity aborted transactions -- only. TxSDELogAnalyzer.py helps you filter out the log data and presents you only the data for a specific abort reason, or a commit. The command format to do exactly this is shown below.

>$TxSDELogAnalyzer.py -t <type_num> [-o <output_file>] <input_log_file> type_num can be 1 (all log data of both committed and aborted transactions), 2 (for capacity aborted transactions only) or 8 (for committed transactions only), # The command below outputs only capacity-abort related transaction logs >$ TxSDELogAnalyzer.py -t 2 sde-tsx-out.txt

# The following command line writes only commit-related transaction logs to a file named log_data_for_commits_data.txt
>$TxSDELogAnalyzer.py -t 8 -o log_data_for_commits_data.txt sde-tsx-out.txt  ### Sampling transactions Since you cannot analyse millions of transactions to identify code path patterns in a limited time you may want to extract log data for a few typical transactions out of millions. TxSDELogAnalyzer.py does exactly that. It randomly selects num random transactions. It supplements these data with cache set accesses giving a full overview of the cache set population at each operation within a given transaction. This features works in conjunction with "Transactions data filtering" described above. An example of commands for this feature is as follows. >$ TxSDELogAnalyzer.py -t 8 -r <num> <input_log_file> # for commits, num random commits
>\$ TxSDELogAnalyzer.py -t 2 -r <num> <input_log_file> # for capacity aborts, num random capacity aborts

A sample output from these commands is as below:

Transaction #0
OPERATION                      ; CACHE LINE ADDRESS; CACHE SET #; NEW CACHE LINE; CACHE SET POPULATION;    FUNCTION              ;  MANGLED FUNCTION         ; LIBRARY                 ; SOURCE
[ 1 1] @0x400b66 write access to 0x7f7bc2a39e7c:4;     0x7f7bc2a39e40;          57;           TRUE;                    1; tm_begin()               ; _Z8tm_beginv              ; /ABank/aBank:0x000000bee; /ABank/hle_lock.h 45
[ 1 1] @0x400b69 read access to 0x7f7bc2a39e7c:4 ;     0x7f7bc2a39e40;          57;          FALSE;                    1; tm_begin()               ; _Z8tm_beginv              ; /ABank/aBank:0x000000bee; /ABank/hle_lock.h 45
[ 1 1] @0x400c29 read access to 0x7f7bc2a39eac:4 ;     0x7f7bc2a39e80;          58;          FALSE;                    1; paySalaries(void*)       ; _Z11paySalariesPv         ; /ABank/aBank:0x000000c29; /ABank/SalaryBatch.cpp 78
[ 1 1] @0x400a36 write access to 0x7f7bc2a39e80:8;     0x7f7bc2a39e80;          58;          FALSE;                    1; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a36; /ABank/SalaryBatch.cpp 62
. . .
[ 1 1] @0x400a4b read access to 0x7f7bc2a39e7c:4 ;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a57 read access to 0x7f7bc2a39e7c:4 ;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a4e write access to 0x7f7bc2a39e60:8;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a53 write access to 0x7f7bc2a39e7c:4;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a57 read access to 0x7f7bc2a39e7c:4 ;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a4b read access to 0x7f7bc2a39e7c:4 ;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a53 write access to 0x7f7bc2a39e7c:4;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] @0x400a4e write access to 0x7f7bc2a39e60:8;     0x7f7bc2a39e40;          57;          FALSE;                    4; myBank::paySalary(int)   ; _ZN6myBank9paySalaryEi    ; /ABank/aBank:0x000000a53; /ABank/SalaryBatch.cpp 65
[ 1 1] self abort transaction(1) abort reason 9
;  TOTAL FOOTPRINT: 250 cache lines;
;  TOTAL WRITE SET: 249 cache lines;




Attachment Size
5 KB

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.