Get Started Guide

  • 2022.1
  • 04/11/2022
  • Public Content

Prototype Threading Designs

With the
Threading
perspective, you can identify the best candidates for parallelizing, prototype threading and check, if there are data dependencies preventing parallelizing of certain functions/loops.
This page explains how to profile
nqueens
application and choose the best candidates for parallelization with threads. You can also use your own application to follow the instructions below.
Threading Summary report for the nqueens applicatoion
Follow the steps:

Prerequisites

  1. Install the
    Intel Advisor
    as a standalone or as part of
    Intel® oneAPI Base Toolkit
    . For installation instructions, see Install
    Intel Advisor
    in the user guide.
  2. Install the
    Intel® C++ Compiler Classic
    as a standalone or as part of
    Intel® oneAPI HPC Toolkit
    . For installation instructions, see Intel® oneAPI Toolkits Installation Guide.
  3. Set up environment variables for the
    Intel Advisor
    and
    Intel® C++ Compiler Classic
    . For example, run the
    setvars
    script in the installation directory.
    This document assumes you installed the tools to a default location. If you installed the tools to a different location, make sure to replace the default path in the commands below.
    Do not close the terminal or command prompt after setting the environment variables. Otherwise, the environment resets.

Unpack and Build Your Application

On Linux* OS
From the terminal where you set the environment variables:
  1. Go to the
    /opt/intel/oneapi/advisor/latest/samples/en/C++
    directory.
  2. Copy the
    nqueens_Advisor.tgz
    file to a writable directory or share on your system.
  3. Extract the sample from the
    .tgz
    file.
  4. Change directory to the
    nqueens_Advisor/
    directory in its unzipped location.
  5. Build the sample application:
    make 1_nqueens_serial
  6. Run the application to verify the build:
    ./1_nqueens_serial
    The application output window displays a board size of 14 and the total time it took to run the target.
On Windows* OS (From Command Line)
  1. Find
    Visual Studio Tools
    for your Microsoft Visual Studio* and OS version, and select one of the command prompt shortcuts. For example, from the Microsoft Windows* 10
    Start
    pane, select
    Visual Studio 2019
    x64 Native Tools Command Prompt for VS2019
    .
  2. Go to the
    C:\Program Files (x86)\Intel\oneAPI\advisor\latest\samples\en\C++
    directory.
  3. Copy the
    nqueens_Advisor.zip
    file to a writable directory or share on your system.
  4. Extract the sample from the
    .zip
    file.
  5. Change directory to the
    nqueens_Advisor/
    directory in its unzipped location.
  6. Build the target in release mode:
    devenv nqueens_Advisor.sln /build release /project 1_nqueens_serial
  7. Change directory to the
    Release
    directory.
  8. Run the application to verify the build:
    1_nqueens_serial.exe
    The application output window displays a board size of 14 and the total time it took to run the target.
On Windows* OS (From Microsoft Visual Studio)
  1. Go to the
    C:\Program Files (x86)\Intel\oneAPI\advisor\latest\samples\en\C++
    directory.
  2. Copy the
    nqueens_Advisor.zip
    file to a writable directory or share on your system.
  3. Extract the sample from the
    .zip
    file.
  4. Launch the Microsoft Visual Studio IDE.
  5. Choose
    File
    Open
    Project/Solution...
    .
  6. In the
    Open Project
    dialog box, navigate to the
    nqueens_Advisor/
    directory in its unzipped location and open the
    nqueens_Advisor.sln
    file.
    If you get a dialog window suggesting you to retarget the application, click
    OK
    .
  7. If the
    Solutions Configuration
    drop-down is set to
    Debug
    , change it to
    Release
    .
  8. Right-click the
    1_nqueens_serial
    project in the
    Solution Explorer
    and
    Choose Set as Start Up Project
    .
  9. If you want to use the Intel® C++ Compiler Classic, right-click the
    1_nqueens_serial
    project and click
    Intel Compiler
    Use Intel C++ Compiler Classic
    .
  10. Right-click the
    1_nqueens_serial
    project, then choose
    Properties
    to verify the sample code uses the optimal release build settings.
    For details about recommended build setting, see Build Target Application.
  11. Click the
    OK
    button to close the
    Properties
    dialog box.
  12. Choose
    Build
    Clean Solution
    .
  13. Choose
    Build
    Build 1_nqueens_serial
    to build the target.
    The application output window displays a board size of 14 and the total time it took to run the target.
  14. If the Visual Studio* IDE responds that any projects are out of date, click
    No
    to not build them.

Collect Baseline Performance Data

Run
Threading
Perspective from Graphical User Interface (GUI)
  1. From the terminal or command prompt where you set the environment variables, launch the
    Intel Advisor
    GUI:
    advisor-gui
  2. Create a project for the just-built
    vec_samples
    application. For details, see Before You Begin.
    When in the
    Project Properties
    dialog box, make sure the
    Inherit settings from Survey Hotspots Analysis Type
    checkbox is selected in the
    Trip Counts and FLOP Analysis
    ,
    Dependencies Analysis
    , and
    Memory Access Patterns Analysis
    types.
    If you work in the Microsoft Visual Studio IDE, you do not need to create a project as the
    Intel Advisor
    creates it automatically when you first open the
    Intel Advisor
    GUI.
  3. From the
    Perspective Selector
    pane, choose the Threading perspective.
  4. In the
    Analysis Workflow
    pane, set data collection accuracy level to
    Low
    , and click the button to run the perspective.
    At this accuracy level,
    Intel Advisor
    runs Survey analysis to profile the application.
Run
Threading
from Command Line Interface (CLI) on Linux OS
Run Survey analysis to collect performance metrics and identify loops/functions with the longest total time:
advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
In the
Threading
perspective, you should specify the source search directory using the
--search-dir
option.
When the analysis execution completes, the
1_nqueens_serial
project is created automatically, which includes the
Vectorization and Code Insights
results. You can view them from
Intel Advisor
GUI.
Run
Threading
from CLI on Windows OS
Run Survey analysis to collect performance metrics and identify loops/functions with the longest total time:
advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
In the
Threading
perspective, you should specify the source search directory using the
--search-dir
option.
When the analysis execution completes, the
1_nqueens_serial
project is created automatically, which includes the
Vectorization and Code Insights
results. You can view them from
Intel Advisor
GUI.

Examine Results to Find Opportunities for Parallelization

If you collect data using GUI,
Intel Advisor
automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./1_nqueens_serial
If the result does not open automatically, click
Show Result
.
When you open the
Vectorization and Code Insights
result in GUI,
Intel Advisor
shows the
Summary
tab first. This window is a dashboard containing the main information about application execution, performance hints, and indication of vectorization problems in your application.
Switch to the
Survey & Roofline
to examine performance metrics for each loop/function and find the candidates for parallelization.
See the Survey & Roofline report tab in the Threading report to find parallel opportunities.
In the bottom pane of the
Survey & Roofline
report, click
Top Down
on the navigation toolbar to investigate functions/loops in hierarchy.
  1. The
    Total Time
    column shows the time spent in a function or loop and all functions called from it. A row with a large
    Total Time %
    and multiple children with smaller total times are possible candidates for parallelism.
  2. The
    Self Time
    column shows how much time was spent in each function or loop each time in was called. Loops or functions with significant self time values are possible candidates for distributing work.
  3. The application spends the most time in the
    setQueen()
    function and calls itself recursively. This function is the parallelization candidate.

Mark Best Parallel Opportunities with Annotations

Annotations are subroutine calls or macro uses that you can use to mark places in serial parts of your program where Intel Advisor should assume your program's parallel execution and synchronization will occur. The annotations do not change the computations of your program, so your application runs normally.
  1. Open the application source code
    nqueens_serial.cpp
    in your preferred editor.
  2. Search for
    ADVISOR SUITABILITY EDIT
    and follow the directions in the sample code. Make four total edits to annotate the code:
    • Uncomment
      #include <advisor-annotate.h>
      . This file is the include file that defines the annotations.
    • Uncomment
      ANNOTATE_SITE_BEGIN(solve);
      . This annotation marks the start of a parallel site that contains a single task in a loop.
    • Uncomment
      ANNOTATE_ITERATION_TASK(setQueen);
      . This annotation marks an iterative parallel task in a loop.
    • Uncomment
      ANNOTATE_SITE_END();
      . This annotation marks an end of a parallel site.
  3. Save your edits and close the editor.
  4. Rebuild the target.
    If the build fails due to the include file not found and undefined identifiers:
    1. Go to
      Project
      1_nqueens_setial Properties
      .
    2. In the
      C/C++
      Additional Include Directories
      , change the
      Intel Advisor
      year
      version to the version installed on your machine. For example,
      ADVISOR_2022_DIR
      .

Model Threading Parallelism

Re-run the
Threading
perspective with additional analyses. Do
one
of the following:
Run
Threading
from GUI
  1. In the
    Analysis Workflow
    pane, select the
    Medium
    accuracy level to configure the perspective automatically.
  2. Click the button to run the perspective.
    At this accuracy level,
    Intel Advisor
    runs Survey, Characterization with trip counts, Suitability, and Dependencies analyses.
    If you get the
    Your configuration might be incomplete
    message, click
    Continue
    . This warning message reminds you to make sure you have added annotations to your source code because Suitability and Dependencies analyses cannot run without them.
Run
Threading
from CLI on Linux OS
  1. Run the Survey analysis to analyze performance.
    advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
  2. Collect trip counts data.
    advisor --collect=tripcounts --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
  3. Model threading designs for the annotated functions/loops with the Suitability analysis.
    advisor --collect=suitability --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
  4. Identify data sharing problems that might prevent annotated functions/loops from parallelizing with the Dependencies analysis:
    advisor --collect=dependencies --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
Run
Threading
from CLI on Windows OS
  1. Run the Survey analysis to analyze performance.
    advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
  2. Collect trip counts data.
    advisor --collect=tripcounts --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
  3. Model threading designs for the annotated functions/loops with the Suitability analysis.
    advisor --collect=suitability --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
  4. Identify data sharing problems that might prevent annotated functions/loops from parallelizing with the Dependencies analysis:
    advisor --collect=dependencies --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
Examine the Results
If you collect data using GUI,
Intel Advisor
automatically opens the results when the collection completes.
If you collect data using CLI, open the results in GUI using the following command:
advisor-gui ./1_nqueens_serial
If the result does not open automatically, click
Show Result
.
When the Threading report opens, examine the application performance modeled with parallelism.
  1. Go to the
    Suitability
    report tab to examine how parallelization can improve the performance:
    • For the annotated loop at
      nqueens_serial.cpp:154
      , the
      Intel Advisor
      predicts the performance speedup around 1.80x for default configuration parameters.
    • As the
      Scalability of Maximum Site Gain
      diagram shows, for CPU count from 2 to 16, the performance speedup increases. For the CPU count higher that 16, the performance speedup is the same, because the corresponding bull-eye dots are on the same line. Most of the dots on the diagram are located in the green zone, but from the 16 CPU, the higher the CPU count, the closer it is to the yellow zone. This means that the predicted speedup is worth an effort if you parallelize the loop for up to 16 CPUs. Parallelizing the loop to run on more than 16 CPUs might require more time and/or effort, but will result in the same speedup and might cause performance issues.
      See the Suitability report to experiment with various modeling parameters and estimate the speedup.
  2. Examine the three percentage metrics below the diagram. Notice that for the default CPU count of 8, the metrics are all green, which means that there are no performance issues. You are recommended to parallelize the loop for up to 8 CPUs to achieve optimal performance.
  3. Change the
    CPU Count
    to
    16
    to see the details about the predicted performance for this case. Notice that the corresponding dot is located closer to the yellow zone that the dots on the left from it. The
    Load Imbalance
    metric is yellow and is around 44%. The high load imbalance causes the predicted maximum speedup to be not enough to justify the effort needed to refactor your application. Consider investigating to understand how to optimize it.
  4. Experiment with the CPU count, threading model, and other parameters to see how they might affect the performance.
  5. Go to the
    Refinement
    reports tab to see if the annotated loops have dependencies that prevent parallelism.
    See the Refinement report to check if there are any loop-carried dependencies that might prevent the threading
    1. In the top pane of the
      Refinement Report
      , notice
      RAW
      (read after write),
      WAR
      (write after read), and
      WAW
      (write after write) dependencies in the loop in
      solve
      at
      nqueens_serial.cpp:154
      . T
    2. From the top pane, select the loop in
      solve
      at
      nqueens_serial.cpp:154
      .
    3. In the
      Problems and Messages
      pane, examine the dependency problems found in the loop in more details. Select one of the problems to see more information. For example, select the
      Read after write
      dependency.
    4. In the
      Code Locations
      pane, examine the source of the Read after write dependency: The instructions reference the
      nrOfSolutions
      variable as the
      Variable Reference
      column shows. This means that a
      race condition
      happens because multiple tasks may try to increment the same variable at the same time.
    You should fix the dependencies before applying threading to the application.

Next Steps

  1. Fix the dependencies found in the annotated loops. From the sample application source code, search for
    ADVISOR CORRECTNESS EDIT
    and follow the directions in the sample code to fix the problems (make six total edits).
  2. Rebuild the application and rerun the Threading perspective with the
    Medium
    accuracy (run the Survey, Trip Counts, Suitability, and Dependencies analyses).
  3. Make sure there are no dependencies found and your fixes did not negatively impact the predicted maximum speedup. Notice that the predicted speedup is higher and the load imbalance is green and does not impact the estimated performance anymore for the CPU count up to 8.
  4. When you decide the predicted maximum speedup benefit is worth the effort to add parallelism to your target, replace annotations with parallel framework code.
    This sample application already has the versions with replaced annotations with parallel framework code. Examine the following files:
    Parallel Framework
    File
    Intel® Cilk™ Plus
    3_nqueens_cilk.cpp
    OpenMP*
    3_nqueens_omp.cpp
    Intel® Threading Building Blocks (Intel® TBB)
    3_nqueens_tbb.cpp
  5. Build the parallel version of the sample.
  6. Test the resulting parallel application for correctness and verify its actual parallel performance using other
    Intel Advisor
    perspectives
    , the Intel® Inspector, and Intel® VTune™ Profiler.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.