Visible to Intel only — GUID: GUID-F26E8B23-9A99-46FE-B03C-7E785CFEF9C8
Visible to Intel only — GUID: GUID-F26E8B23-9A99-46FE-B03C-7E785CFEF9C8
analyze.py Options
This script allows you to run an analysis on profiling data and generate report results.
Usage
advisor-python <APM>/analyze.py <project-dir> [--options]
Options
The following table describes options that you can use with the analyze.py script.
Option |
Description |
---|---|
<project-dir> |
Required. Specify the path to the Intel® Advisor project directory. |
-h --help |
Show all script options. |
--version |
Display Intel® Advisor version information. |
-v <verbose> --verbose <verbose> |
Specify output verbosity level:
NOTE:
This option affects the console output, but does not affect logs and report results.
|
--assume-dependencies (default) | --no-assume-dependencies |
Assume that a loop has a dependency if the loop type is not known. When disabled, assume that a loop does not have dependencies if the loop dependency type is unknown. |
--assume-hide-taxes [<loop-id> | <file-name>:<line-number>] |
Use an optimistic approach to estimate invocation taxes: hide all invocation taxes except the first one. You can provide a comma-separated list of loop IDs and source locations to hide taxes for. If you do not provide a list, taxes are hidden for all loops. |
--assume-never-hide-taxes (default) |
Use a pessimistic approach to estimate invocation taxes: do not hide invocation taxes. |
--assume-ndim-dependency (default) | --no-assume-ndim-dependency |
When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops. |
--assume-parallel | --no-assume-parallel (default) |
Assume that a loop is parallel if the loop type is not known. |
--assume-single-data-transfer (default) | --no-assume-single-data-transfer |
Assumed data is transferred once for each offload, and all instances share the data. When disabled, assume each data object is transferred for every instance of an offload that uses it. This method assumes no data re-use between calls to the same kernel.
IMPORTANT:
This option requires you to enable the following options during the Trip Counts collection:
|
--atomic-access-pattern <pattern> |
Select an atomic access pattern. Possible options: sequential, partial_sums_16, same. By default, it is set to partial_sums_16. |
--assume-atomic-optimization-ratio <ratio> |
Model atomic accesses as a number of parallel sums. Specify one of the following values: 8, 16, 32, 64, 128 to model a specific number of parallel sums. Specify 0 value to search for an optimal number of parallel sums. Default value: '16'. |
--check-profitability (default) | --no-check-profitability |
Check the profitability of offloading regions. Only regions that can benefit from the increased speed are added to a report. When disabled, add all evaluated regions to a report, regardless of the profitability of offloading specific regions. |
--config <config> |
Specify a configuration file by absolute path or name. If you choose the latter, the model configuration directory is searched for the file first, then the current directory. The following device configurations are available: xehpg_512xve (default), xehpg_256xve , gen12_tgl, gen12_dg1.
NOTE:
You can specify several configurations by using the option more than once.
|
--count-logical-instructions (default) | --no-count-logical-instructions |
Use the projection of x86 logical instructions to GPU logical instructions. |
--count-memory-instructions (default) | --no-count-memory-instructions |
Use the projection of x86 instructions with memory to GPU SEND/SENDS instructions. |
--count-mov-instructions | --no-count-mov-instructions (default) |
Use the projection of x86 MOV instructions to GPU MOV instructions. |
--count-send-latency {all, first, off} |
Select how to model SEND instruction latency.
|
--cpu-scale-factor <integer> |
Assume a host CPU that is faster than the original CPU by the specified value. All original CPU times are divided by the scale factor. |
--data-reuse-analysis | --no-data-reuse-analysis (default) |
Estimate data reuse between offloaded regions. Disabling can decrease analyze overhead.
IMPORTANT:
This option requires you to enable the following options during the Trip Counts collection:
|
--data-transfer-histogram (default) | --no-data-transfer-histogram |
Estimate fine-grained data transfer and latencies for each object transferred and add a memory object histogram to a report.
IMPORTANT:
This option requires you to enable track-memory-objects or data-transfer=medium or higher (for advisor CLI only) during the Trip Counts collection.
|
--disable-fp64-math-optimization |
Disable accounting for optimized traffic for transcendentals on the GPU. |
--enable-batching | --disable-batching (default) |
Enable job batching for top-level offloads. Emulate the execution of more than one instance simultaneously. |
--enable-edram |
Enable eDRAM modeling in the memory hierarchy model.
NOTE:
Make sure to use this option with both collect.py and analyze.py.
|
--enable-slm |
Enable SLM modeling in the memory hierarchy model.
NOTE:
Make sure to use this option with both collect.py and analyze.py.
|
--enforce-baseline-decomposition | --no-enforce-baseline-decomposition (default) |
Use the same local size and SIMD width as measured on the baseline. When disabled, search for an optimal local size and SIMD width to optimize kernel execution time. Enable the option for the GPU-to-GPU performance modeling. |
-e, --enforce-offloads |