A newer version of this document is available. Customers should click here to go to the newest version.
advisor Command Option Reference
The advisor command currently supports the options shown below.
| Option | Description | 
|---|---|
| Set an accuracy level for the Offload Modeling collection preset. | |
| Add loops (by file and line number) to the loops selected for deeper analysis. | |
| Specify the directory where the target application runs during analysis, if it is different from the current working directory. | |
| Assume that a loop has dependencies if the loop dependency type is unknown. | |
| Estimate invocation taxes assuming the invocation tax is paid only for the first kernel launch. | |
| When searching for an optimal N-dimensional offload, assume there are dependencies between inner and outer loops. | |
| Assume data is only transferred once for each offload, and all instances share that data. | |
| Finalize Survey and Trip Counts & FLOP analysis data after collection is complete. | |
| Emulate the execution of more than one instance simultaneously for a top-level offload. | |
| Run benchmarks on only one concurrently executing Intel Advisor instance to avoid concurrency issues with regard to platform limits. | |
| Generate a Survey report in bottom-up view. | |
| Enable binary visibility in a read-only snapshot you can view any time. | |
| Select what binary files will be added to a read-only snapshot. | |
| Set the cache hierarchy to collect modeling data for CPU cache behavior during Trip Counts & FLOP analysis. | |
| Simulate device cache behavior for your application. | |
| Enable source code visibility in a read-only snapshot you can view any time (with the --snapshot action). Enable keeping source code cache within a project (with the --collect action). | |
| Enable cache simulation for Performance Modeling. | |
| Set the cache associativity for modeling CPU cache behavior during the Memory Access Patterns analysis. | |
| Set the cache line size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis. | |
| Set the focus for modeling CPU cache behavior during Memory Access Patterns analysis. | |
| Specify what percentage of total memory accesses should be processed during cache simulation. | |
| Set the cache set size (in bytes) for modeling CPU cache behavior during Memory Access Patterns analysis. | |
| Check the profitability of offload regions and add only profitable regions to a report. | |
| Clear all loops previously selected for deeper analysis. | |
| Specify a device configuration to model your application performance for. | |
| Use the projection of x86 logical instructions to GPU logical instructions. | |
| Project x86 memory instructions to GPU SEND/SENDS instructions. | |
| Count the number of accesses to memory objects created by code regions. | |
| Project x86 MOV instructions to GPU MOV instructions. | |
| Select how to model SEND instruction latency. | |
| Specify a scale factor to approximate a host CPU that is faster than the baseline CPU by this factor. | |
| Set the delimiter for a report in CSV format. | |
| Specify the ablosute path or name for a custom TOML configuration file with additional modeling parameters. | |
| Limit the maximum amount (in MB) of raw data collected during Survey analysis. | |
| Analyze potential data reuse between code regions. | |
| Set the level of details for modeling data transfers during Characterization. | |
| Estimate data transfers in details and latencies for each transferred object. | |
| Specify memory page size to set the traffic measurement granularity for the data transfer simulator. | |
| Show only floating-point data, only integer data, or data for the sum of both data types in a Roofline interactive HTML report. | |
| Remove previously collected trip counts data when re-running a Survey analysis with changed binaries. | |
| Do not account for optimized traffic for transcendentals on a GPU. | |
| Show a callstack for each loop/function call in a report. | |
| List all steps included in Offload Modeling batch collection at a specified accuracy level without running them. | |
| Specify the maximum amount of time (in seconds) an analysis runs. | |
| Show (in a Survey report) how many instructions of a given type actually executed during Trip Counts & FLOP analysis. | |
| enable-batching | Deprecated. | 
| Model CPU cache behavior on your target application. | |
| Model data transfer between host memory and device memory. | |
| Enable a simulator to model GRF. | |
| enable-slm | Deprecated. SLM is modeled by default if available. | 
| Examine specified annotated sites for opportunities to perform task-chunking modeling in a Suitability report. | |
| Use the same local size and SIMD width as measured on a baseline device. | |
| Emulate data distribution over stacks if stacks collection is disabled. | |
| Offload all selected code regions even if offloading their child loops/functions is more profitable. | |
| Estimate region speedup with relaxed constraints. | |
| Consider loops recommended for offloading only if they reach the minimum estimated speedup specified in a configuration file. | |
| Exclude the specified files or directories from annotation scanning during analysis. | |
| Specify an application for analysis that is not the starting application. | |
| Specify a path to an unpacked result snapshot or an MPI rank result to generate a report or model performance. | |
| Filter data by the specified column name and value in a Survey and Trips Counts & FLOP report. | |
| Enable filtering detected stack variables by scope (warning vs. error) in a Dependencies analysis. | |
| Mark all potential reductions by specific diagnostic during Dependencies analysis. | |
| Enable flexible cache simulation to change cache configuration without re-running collection. | |
| Collect data about floating-point and integer operations, memory traffic, and mask utilization metrics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) platforms during Trip Counts & FLOP analysis. | |
| Consider all arithmetic operations as single-precision floating-point or int32 operations. | |
| Consider all arithmetic operations as double-precision floating-point or int64 operations. | |
| Set a report output format. | |
| With Offload Modeling perspective, analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Graphics. With GPU Roofline Insights perspective. create a Roofline interactive HTML report for data collected on GPUs. | |
| Collect memory traffic generated by OpenCL™ and Intel® Media SDK programs executed on Intel® Processor Graphics. | |
| Helps to minimize data collection overhead by including exactly the GPU kernels you want to be profiled. | |
| Specify time interval, in milliseconds, between GPU samples during Survey analysis. | |
| Disable data transfer tax estimation. | |
| Specify runtimes or libraries to ignore time spent in these regions when calculating per-program speedup. | |
| Ignore mismatched target or application parameter errors before starting analysis. | |
| Ignore mismatched module checksums before starting analysis. | |
| Analyze the Nth child process during Memory Access Patterns and Dependencies analysis. | |
| Model traffic on all levels of the memory hierarchy for a Roofline report. | |
| Set the length of time (in milliseconds) to wait before collecting each sample during Survey analysis. | |
| Set the maximum number of top items to show in a report. | |
| Set the maximum number of instances to analyze for all marked loops. | |
| Specify total time, in milliseconds, to filter out loops that fall below this value. | |
| Select loops (by criteria instead of human input) for deeper analysis. | |
| Enable/disable user selection as a way to control loops/functions identified for deeper analysis. | |
| After running a Survey analysis and identifying loops of interest, select loops (by file and line number or ID) for deeper analysis. | |
| Model specific memory level(s) in a Roofline interactive HTML report, including L1, L2, L3, and DRAM. | |
| Model only load memory operations, store memory operations, or both, in a Roofline interactive HTML report. | |
| Show dynamic or static instruction mix data in a Survey report. | |
| Collect Intel® oneAPI Math Kernel Library (oneMKL) loops and functions data during the Survey analysis. | |
| Use the baseline GPU configuration as a target device for modeling. | |
| Analyze child loops of the region head to find if some of the child loops provide more profitable offload. | |
| Model calls to math functions such as EXP, LOG, SIN, and COS as extended math instructions, if possible. | |
| Analyze code regions with system calls considering they are separated from offload code and executed on a host device. | |
| Specify application (or child application) module(s) to include in or exclude from analysis. | |
| Limit, by inclusion or exclusion, application (or child application) module(s) for analysis. | |
| Specify MPI process data to import. | |
| Set the Microsoft* runtime environment mode for analysis. | |
| When searching for an optimal N-dimensional offload, limit the maximum loop depth that can be converted to one offload. | |
| Specify a text file containing command line arguments. | |
| Enable asynchronous execution to overlap offload overhead with execution time. | |
| Pack a snapshot into an archive. | |
| Analyze OpenCL™ and oneAPI Level Zero programs running on Intel® Processor Graphics. | |
| Show Intel® performance libraries loops and functions in Intel® Advisor reports. | |
| Collect metrics about Just-In-Time (JIT) generated code regions during the Trip Counts and FLOP analysis. | |
| Collect Python* loop and function data during Survey analysis. | |
| Collect metrics for stripped binaries. | |
| Specify the top-level directory where a result is saved if you want to save the collection somewhere other than the current working directory. | |
| Minimize status messages during command execution. | |
| Recalculate total time after filtering a report. | |
| Enable heap allocation tracking to identify heap-allocated variables for which access strides are detected during Memory Access Patterns analysis. | |
| Capture stack frame pointers to identify stack variables for which access strides are detected during Memory Access Patterns analysis. | |
| Examine specified annotated sites for opportunities to reduce lock contention or find deadlocks in a Suitability report. | |
| Examine specified annotated sites for opportunities to reduce lock overhead in a Suitability report. | |
| Examine specified annotated sites for opportunities to reduce site overhead in a Suitability report. | |
| Examine specified annotated sites for opportunities to reduce task overhead in a Suitability report. | |
| Refinalize a survey result collected with a previous Intel® Advisor version or if you need to correct or update source and binary search paths. | |
| Remove loops (by file and line number) from the loops selected for deeper analysis. | |
| Redirect report output from stdout to another location. | |
| Specify the PATH/name of a custom report template file. | |
| Specify a directory to identify the running analysis. | |
| Resume collection after the specified number of milliseconds. | |
| Return the target exit code instead of the command line interface exit code. | |
| Specify the location(s) for finding target support files. | |
| Enable searching for an optimal N-dimensional offload. | |
| Select loops (by file and line number, ID, or criteria) for deeper analysis. | |
| Assume loops with specified IDs or source locations have a dependency. | |
| Assume loops with specified IDs or source locations are parallel. | |
| Specify a single-line parameter to modify in a target device configuration. | |
| Show data for all available columns in a Survey report. | |
| Show data for all available rows, including data for child loops, in a Survey report. | |
| Show only functions in a report. | |
| Show only loops in a report. | |
| Show not-executed child loops in a Survey report. | |
| Generate a Survey report for data collected for GPU kernels. | |
| Specify the total time threshold, in milliseconds, to filter out nodes that fall below this value from PDF and DOT Offload Modeling reports. | |
| Sort data in ascending order (by specified column name) in a report. | |
| Sort data in descending order (by specified column name) in a report. | |
| Register flow analysis to calculate the number of consecutive load/store operations in registers and related memory traffic in bytes during Survey analysis. | |
| Specify stack access size to set stack memory access measurement granularity for the data transfer simulation. | |
| Restructure the call flow during Survey analysis to attach stacks to a point introducing a parallel workload. | |
| Set stack size limit for analyzing stacks after collection. | |
| Perform advanced collection of callstack data during Roofline and Trip Counts & FLOP analysis. | |
| Choose between online and offline modes to analyze stacks during Survey analysis. | |
| Start executing the target application for analysis purposes, but delay data collection. | |
| Statically calculate the number of specific instructions present in the binary during Survey analysis. | |
| Specify processes and/or children for instrumentation during Survey analysis. | |
| Collect a variety of data during Survey analysis for loops that reside in non-executed code paths. | |
| Specify a device configuration to model cache for during Trip Counts collection. | |
| Specify a target GPU to collect data for if you have multiple GPUs connected to your system. | |
| Attach Survey or Trip Counts & FLOP collection to a running process specified by the process ID. | |
| Attach Survey or Trip Counts & FLOP collection to a running process specified by the process name. | |
| Specify the hardware configuration to use for modeling purposes in a Suitability report. | |
| Specify the threading model to use for modeling purposes in a Suitability report. | |
| Specify the number of parallel threads to use for offload heads. | |
| Generate a Survey report in top-down view. | |
| Set how to trace loop iterations during Memory Access Patterns analysis. | |
| Configure collectors to trace MPI code and determine MPI rank IDs for non-Intel® MPI library implementations. | |
| Attribute memory objects to the analyzed loops that accessed the objects. | |
| Track accesses to stack memory. | |
| Enable parallel data sharing analysis for stack variables during Dependencies analysis. | |
| Collect loop trip counts data during Trip Counts & FLOP analysis. | |
| use-collect-configs | Deprecated. | 
| user-data-dir | Deprecated. | 
| Maximize status messages during command execution. | |
| Show call stack data in a Roofline interactive HTML report (if call stack data is collected). |