advisor Command Option Reference
advisor
Command Option ReferenceThe
advisor
command currently supports the options shown below.
Option | Description |
---|---|
Set an accuracy level for the Offload Modeling collection preset.
| |
Add loops (by file and line number) to the loops selected for deeper analysis.
| |
Specify the directory where the target application runs during analysis, if it is different from the current working directory.
| |
Assume that a loop has dependencies if the loop dependency type is unknown.
| |
Estimate invocation taxes assuming the invocation tax is paid only for the first kernel launch.
| |
When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops.
| |
Assume data is only transferred once for each offload, and all instances share that data.
| |
Finalize Survey and Trip Counts & FLOP analysis data after collection is complete.
| |
Emulate the execution of more than one instance simultaneously for a top-level offload.
| |
Run benchmarks on only one concurrently executing Intel Advisor instance to avoid concurrency issues with regard to platform limits.
| |
Generate a Survey report in bottom-up view.
| |
Enable binary visibility in a read-only snapshot you can view any time.
| |
Select what binary files will be added to a read-only snapshot.
| |
Set the cache hierarchy to collect modeling data for CPU cache behavior during Trip Counts & FLOP analysis.
| |
Simulate device cache behavior for your application.
| |
Enable source code visibility in a read-only snapshot you can view any time (with the
--snapshot action). Enable keeping source code cache within a project (with the
--collect action).
| |
Enable cache simulation for Performance Modeling.
| |
Set the cache associativity for modeling CPU cache behavior during the Memory Access Patterns analysis.
| |
Set the cache line size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis.
| |
Set the focus for modeling CPU cache behavior during Memory Access Patterns analysis.
| |
Specify what percentage of total memory accesses should be processed during cache simulation.
| |
Set the cache set size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis.
| |
Check the profitability of offload regions and add only profitable regions to a report.
| |
Clear all loops previously selected for deeper analysis.
| |
Specify a device configuration to model your application performance for.
| |
Use the projection of x86 logical instructions to GPU logical instructions.
| |
Project x86 memory instructions to GPU SEND/SENDS instructions.
| |
Count the number of accesses to memory objects created by code regions.
| |
Project x86 MOV instructions to GPU MOV instructions.
| |
Select how to model SEND instruction latency.
| |
Specify a scale factor to approximate a host CPU that is faster than the baseline CPU by this factor.
| |
Set the delimiter for a report in CSV format.
| |
Specify the ablosute path or name for a custom TOML configuration file with additional modeling parameters.
| |
Limit the maximum amount (in MB) of raw data collected during Survey analysis.
| |
Analyze potential data reuse between code regions.
| |
Set the level of details for modeling data transfers during Characterization.
| |
Estimate data transfers in details and latencies for each transferred object.
| |
Specify memory page size to set the traffic measurement granularity for the data transfer simulator.
| |
Show only floating-point data, only integer data, or data for the sum of both data types in a Roofline interactive HTML report.
| |
Remove previously collected trip counts data when re-running a Survey analysis with changed binaries.
| |
Do not account for optimized traffic for transcendentals on a GPU.
| |
Show a callstack for each loop/function call in a report.
| |
List all steps included in Offload Modeling batch collection at a specified accuracy level without running them.
| |
Specify the maximum amount of time (in seconds) an analysis runs.
| |
Show (in a Survey report) how many instructions of a given type actually executed during Trip Counts & FLOP analysis.
| |
enable-batching | Deprecated.
|
Model CPU cache behavior on your target application.
| |
Model data transfer between host memory and device memory.
| |
Enable a simulator to model GRF.
| |
enable-slm | Deprecated. SLM is modeled by default if available.
|
Examine specified annotated sites for opportunities to perform task-chunking modeling in a Suitability report.
| |
Use the same local size and SIMD width as measured on a baseline device.
| |
Emulate data distribution over stacks if stacks collection is disabled.
| |
Offload all selected code regions even if offloading their child loops/functions is more profitable.
| |
Estimate region speedup with relaxed constraints.
| |
Consider loops recommended for offloading only if they reach the minimum estimated speedup specified in a configuration file.
| |
Exclude the specified files or directories from annotation scanning during analysis.
| |
Specify an application for analysis that is not the starting application.
| |
Specify a path to an unpacked result snapshot or an MPI rank result to generate a report or model performance.
| |
Filter data by the specified column name and value in a Survey and Trips Counts & FLOP report.
| |
Enable filtering detected stack variables by scope (warning vs. error) in a Dependencies analysis.
| |
Mark all potential reductions by specific diagnostic during Dependencies analysis.
| |
Enable flexible cache simulation to change cache configuration without re-running collection.
| |
Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for AVX-512 platforms during Trip Counts & FLOP analysis.
| |
Consider all arithmetic operations as single-precision floating-point or int32 operations.
| |
Consider all arithmetic operations as double-precision floating-point or int64 operations.
| |
Set a report output format.
| |
With Offload Modeling perspective, analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Graphics. With GPU Roofline Insights perspective. create a Roofline interactive HTML report for data collected on GPUs.
| |
Collect memory traffic generated by OpenCL™ and Intel® Media SDK programs executed on Intel® Processor Graphics.
| |
gpu-kernels | Deprecated. Use
--profile-gpu or
--gpu instead.
|
Specify time interval, in milliseconds, between GPU samples during Survey analysis.
| |
Disable data transfer tax estimation.
| |
Specify runtimes or libraries to ignore time spent in these regions when calculating per-program speedup.
| |
Ignore mismatched target or application parameter errors before starting analysis.
| |
Ignore mismatched module checksums before starting analysis.
| |
Analyze the Nth child process during Memory Access Patterns and Dependencies analysis.
| |
Model traffic on all levels of the memory hierarchy for a Roofline report.
| |
Set the length of time (in milliseconds) to wait before collecting each sample during Survey analysis.
| |
Set the maximum number of top items to show in a report.
| |
Set the maximum number of instances to analyze for all marked loops.
| |
Specify total time, in milliseconds, to filter out loops that fall below this value.
| |
Select loops (by criteria instead of human input) for deeper analysis.
| |
Enable/disable user selection as a way to control loops/functions identified for deeper analysis.
| |
After running a Survey analysis and identifying loops of interest, select loops (by file and line number or ID) for deeper analysis.
| |
Model specific memory level(s) in a Roofline interactive HTML report, including L1, L2, L3, and DRAM.
| |
Model only load memory operations, store memory operations, or both, in a Roofline interactive HTML report.
| |
Show dynamic or static instruction mix data in a Survey report.
| |
Collect Intel® oneAPI Math Kernel Library (oneMKL) loops and functions data during the Survey analysis.
| |
Use the baseline GPU configuration as a target device for modeling.
| |
Analyze child loops of the region head to find if some of the child loops provide more profitable offload.
| |
Model calls to math functions such as EXP, LOG, SIN, and COS as extended math instructions, if possible.
| |
Analyze code regions with system calls considering they are separated from offload code and executed on a host device.
| |
Specify application (or child application) module(s) to include in or exclude from analysis.
| |
Limit, by inclusion or exclusion, application (or child application) module(s) for analysis.
| |
Specify MPI process data to import.
| |
Set the Microsoft* runtime environment mode for analysis.
| |
When searching for an optimal N-dimensional offload, limit the maximum loop depth that can be converted to one offload.
| |
Specify a text file containing command line arguments.
| |
Enable asynchronous execution to overlap offload overhead with execution time.
| |
Pack a snapshot into an archive.
| |
Analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Processor Graphics.
| |
Show Intel® performance libraries loops and functions in
Intel® Advisor reports.
| |
Collect metrics about Just-In-Time (JIT) generated code regions during the Trip Counts and FLOP analysis.
| |
Collect Python* loop and function data during Survey analysis.
| |
Collect metrics for stripped binaries.
| |
Specify the top-level directory where a result is saved if you want to save the collection somewhere other than the current working directory.
| |
Minimize status messages during command execution.
| |
Recalculate total time after filtering a report.
| |
Enable heap allocation tracking to identify heap-allocated variables for which access strides are detected during Memory Access Patterns analysis.
| |
Capture stack frame pointers to identify stack variables for which access strides are detected during Memory Access Patterns analysis.
| |
Examine specified annotated sites for opportunities to reduce lock contention or find deadlocks in a Suitability report.
| |
Examine specified annotated sites for opportunities to reduce lock overhead in a Suitability report.
| |
Examine specified annotated sites for opportunities to reduce site overhead in a Suitability report.
| |
Examine specified annotated sites for opportunities to reduce task overhead in a Suitability report.
| |
Refinalize a survey result collected with a previous Intel® Advisor version or if you need to correct or update source and binary search paths.
| |
Remove loops (by file and line number) from the loops selected for deeper analysis.
| |
Redirect report output from stdout to another location.
| |
Specify the PATH/name of a custom report template file.
| |
Specify a directory to identify the running analysis.
| |
Resume collection after the specified number of milliseconds.
| |
Return the target exit code instead of the command line interface exit code.
| |
Specify the location(s) for finding target support files.
| |
Enable searching for an optimal N-dimensional offload.
| |
Select loops (by file and line number, ID, or criteria) for deeper analysis.
| |
Assume loops with specified IDs or source locations have a dependency.
| |
Assume loops with specified IDs or source locations are parallel.
| |
Specify a single-line parameter to modify in a target device configuration.
| |
Show data for all available columns in a Survey report.
| |
Show data for all available rows, including data for child loops, in a Survey report.
| |
Show only functions in a report.
| |
Show only loops in a report.
| |
Show not-executed child loops in a Survey report.
| |
Generate a Survey report for data collected for GPU kernels.
| |
Specify the total time threshold, in milliseconds, to filter out nodes that fall below this value from PDF and DOT Offload Modeling reports.
| |
Sort data in ascending order (by specified column name) in a report.
| |
Sort data in descending order (by specified column name) in a report.
| |
Register flow analysis to calculate the number of consecutive load/store operations in registers and related memory traffic in bytes during Survey analysis.
| |
Specify stack access size to set stack memory access measurement granularity for the data transfer simulation.
| |
Restructure the call flow during Survey analysis to attach stacks to a point introducing a parallel workload.
| |
Set stack size limit for analyzing stacks after collection.
| |
Perform advanced collection of callstack data during Roofline and Trip Counts & FLOP analysis.
| |
Choose between online and offline modes to analyze stacks during Survey analysis.
| |
Start executing the target application for analysis purposes, but delay data collection.
| |
Statically calculate the number of specific instructions present in the binary during Survey analysis.
| |
Specify processes and/or children for instrumentation during Survey analysis.
| |
Collect a variety of data during Survey analysis for loops that reside in non-executed code paths.
| |
Specify a device configuration to model cache for during Trip Counts collection.
| |
Specify a target GPU to collect data for if you have multiple GPUs connected to your system.
| |
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process ID.
| |
Attach Survey or Trip Counts & FLOP collection to a running process specified by the process name.
| |
Specify the hardware configuration to use for modeling purposes in a Suitability report.
| |
Specify the threading model to use for modeling purposes in a Suitability report.
| |
Specify the number of parallel threads to use for offload heads.
| |
Generate a Survey report in top-down view.
| |
Set how to trace loop iterations during Memory Access Patterns analysis.
| |
Configure collectors to trace MPI code and determine MPI rank IDs for non-Intel® MPI library implementations.
| |
Attribute memory objects to the analyzed loops that accessed the objects.
| |
Track accesses to stack memory.
| |
Enable parallel data sharing analysis for stack variables during Dependencies analysis.
| |
Collect loop trip counts data during Trip Counts & FLOP analysis.
| |
use-collect-configs | Deprecated.
|
user-data-dir | Deprecated.
|
Maximize status messages during command execution.
| |
Show call stack data in a Roofline interactive HTML report (if call stack data is collected).
|