Get Started Guide

  • 2022.0
  • 12/06/2021
  • Public Content

Prototype Threading Designs

Threading
perspective enables you to identify the best candidates for parallelizing, prototype threading and check, if there are data dependencies preventing parallelizing of certain functions/loops.
Intel Advisor Typical Workflow: Prototype Threading Designs
There are two ways to run the
Threading
perspective: from
Intel® Advisor
graphical user interface (GUI) and from command line interface (CLI). You can open results collected with either methods in the GUI.
Follow the steps:

Prerequisites

This guide implies that you have successfully installed the
Intel Advisor
and set up the necessary environment variables. For details, see Before You Begin.
You are also recommended to download and install Intel® oneAPI C++ Compiler Classic as standalone or as part of the Intel® oneAPI HPC Toolkit.

Unpack and Build Your Application

You can use your own application and apply the build instructions below.
To unpack the sample:
  1. Go to the
    <install-dir>
    /samples/
    <locale>
    /C++/
    directory.
  2. Copy the
    nqueens_Advisor.tgz
    (on Linux* OS) or
    nqueens_Advisor.zip
    (on Windows* OS) file to a writable directory or share on your system.
  3. Extract the sample from the
    .tgz
    or
    .zip
    file.
To build the sample, do
one
of the following:
On Linux* OS
  1. Open a new terminal session.
  2. Change directory to the
    nqueens_Advisor/
    directory in its unzipped location.
  3. Build the sample application in release mode:
    make 1_nqueens_serial
  4. Run the application to verify the build:
    ./1_nqueens_serial
    The application output window displays a board size of 14 and the total time it took to run the target.
On Windows* OS (From Command Line)
  1. Find
    Visual Studio Tools
    for your Microsoft Visual Studio* and OS version, and select one of the command prompt shortcuts. For example, from the Microsoft Windows* 10
    Start
    pane, select
    Visual Studio 2019
    x64 Native Tools Command Prompt for VS2019
    .
  2. Change directory to the
    nqueens_Advisor/
    directory in its unzipped location.
  3. Build the target in release mode:
    devenv nqueens_Advisor.sln /build release /project 1_nqueens_serial
  4. Change directory to the
    Release
    directory.
  5. Run the application to verify the build:
    1_nqueens_serial.exe
    The application output window displays a board size of 14 and the total time it took to run the target.
On Windows* OS (From Microsoft Visual Studio)
  1. Launch the Microsoft Visual Studio IDE.
  2. Choose
    File
    Open
    Project/Solution...
    .
  3. In the
    Open Project
    dialog box, navigate to the
    nqueens_Advisor/
    directory in its unzipped location and open the
    nqueens_Advisor.sln
    file.
    If you get a dialog window suggesting you to retarget the application, click
    OK
    .
  4. If the
    Solutions Configuration
    drop-down is set to
    Debug
    , change it to
    Release
    .
  5. Right-click the
    1_nqueens_serial
    project in the
    Solution Explorer
    and
    Choose Set as Start Up Project
    .
  6. If you want to use the Intel® C++ Compiler Classic, right-click the
    1_nqueens_serial
    project and click
    Intel Compiler
    Use Intel C++ Compiler Classic
    .
  7. Right-click the
    1_nqueens_serial
    project, then choose
    Properties
    to verify the sample code uses the optimal release build settings.
    For details about recommended build setting, see Build Target Application.
  8. Click the
    OK
    button to close the
    Properties
    dialog box.
  9. Choose
    Build
    Clean Solution
    .
  10. Choose
    Build
    Build 1_nqueens_serial
    to build the target.
    The application output window displays a board size of 14 and the total time it took to run the target.
  11. If the Visual Studio* IDE responds that any projects are out of date, click
    No
    to not build them.

Collect Baseline Performance Data

Do
one
of the following:
Run
Threading
Perspective Using GUI
  1. Create a project for the built application.
    If you work in the Microsoft Visual Studio IDE, you do not need to create a project as the
    Intel Advisor
    creates it automatically when you first open the
    Intel Advisor
    GUI.
  2. From the
    Perspective Selector
    pane, select
    Threading
    and click
    Choose
    .
  3. In the
    Analysis Workflow
    pane, make sure the
    Low
    accuracy level is selected.
    This accuracy level selects only a Survey analysis, which profiles the application performance.
  4. Click the
    Run the perspective
    button.
Run
Threading
Perspective Using CLI
  1. Run Survey analysis to collect performance metrics and identify loops/functions with the longest total time:
    • On Linux OS:
      advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
    • On Windows OS:
      advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
    In the
    Threading
    perspective, you should specify the source search directory using the
    --search-dir
    option.
  2. Open the result in the
    Intel Advisor
    GUI:
    advisor-gui ./1_nqueens_serial
  3. When the
    Intel Advisor
    GUI launches, click the
    Show Result
    to open the report.
    If you do not see the
    Threading
    results, select the
    Threading
    perspective from the
    Analysis Workflow
    pane drop-down.

Examine Results to Find Opportunities for Parallelization

After the Intel Advisor collects the Survey data, it displays a Threading report. To find the candidates for parallelization:
  1. Go to the
    Survey & Roofline
    report tab to examine application performance data.
    See the Survey & Roofline report tab in the Threading report to find parallel opportunities.
  2. In the bottom pane of the
    Survey & Roofline
    report, click
    Top Down
    on the navigation toolbar.
  3. Investigate the
    Top Down
    pane, which shows functions/loops in hierarchy.
    • The
      Total Time
      column shows the time spent in a function or loop and all functions called from it. A row with a large
      Total Time %
      and multiple children with smaller total times are possible candidates for parallelism.
    • The
      Self Time
      column shows how much time was spent in each function or loop each time in was called. Loops or functions with significant self time values are possible candidates for distributing work.
    • The application spends the most time in the
      setQueen()
      function and calls itself recursively. This function is the parallelization candidate.

Mark Best Parallel Opportunities with Annotations

Annotations are subroutine calls or macro uses that you can use to mark places in serial parts of your program where Intel Advisor should assume your program's parallel execution and synchronization will occur. The annotations do not change the computations of your program, so your application runs normally.
  1. Open the application source code
    nqueens_serial.cpp
    in your preferred editor.
  2. Search for
    ADVISOR SUITABILITY EDIT
    and follow the directions in the sample code. Make four total edits to annotate the code:
    • Uncomment
      #include <advisor-annotate.h>
      . This file is the include file that defines the annotations.
    • Uncomment
      ANNOTATE_SITE_BEGIN(solve);
      . This annotation marks the start of a parallel site that contains a single task in a loop.
    • Uncomment
      ANNOTATE_ITERATION_TASK(setQueen);
      . This annotation marks an iterative parallel task in a loop.
    • Uncomment
      ANNOTATE_SITE_END();
      . This annotation marks an end of a parallel site.
  3. Save your edits and close the editor.
  4. Rebuild the target.
    If the build fails due to the include file not found and undefined identifiers:
    1. Go to
      Project
      1_nqueens_setial Properties
      .
    2. In the
      C/C++
      Additional Include Directories
      , change the
      Intel Advisor
      year
      version to the version installed on your machine. For example,
      ADVISOR_2021_DIR
      .

Model Threading Parallelism

  1. Re-run the
    Threading
    perspective with additional analyses. Do
    one
    of the following:
    From the
    Intel Advisor
    GUI:
    1. In the
      Analysis Workflow
      pane, select the
      Medium
      accuracy level to configure the perspective automatically.
      This selects Survey, Suitability, and Dependencies analyses.
    2. Click the
      Run the perspective
      button.
      If you get the
      Your configuration might be incomplete
      message, click
      Continue
      . This warning message reminds you to make sure you have added annotations to your source code because Suitability and Dependencies analyses cannot run without them.
    From the
    Intel Advisor
    CLI:
    1. Run the Survey analysis to analyze performance.
      On Linux OS
      :
      advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
      On Windows OS
      :
      advisor --collect=survey --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
    2. Collect trip counts data.
      On Linux OS
      :
      advisor --collect=tripcounts --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
      On Windows OS
      :
      advisor --collect=tripcounts --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
    3. Model threading designs for the annotated functions/loops with the Suitability analysis.
      On Linux OS
      :
      advisor --collect=suitability --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
      On Windows OS
      :
      advisor --collect=suitability --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
    4. Identify data sharing problems that might prevent annotated functions/loops from parallelizing with the Dependencies analysis:
      On Linux OS
      :
      advisor --collect=dependencies --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial
      On Windows OS
      :
      advisor --collect=dependencies --project-dir=./1_nqueens_serial --search-dir src:r=./1_nqueens_serial -- 1_nqueens_serial.exe
    5. Open the result in the
      Intel Advisor
      GUI:
      advisor-gui ./1_nqueens_serial
    6. When the
      Intel Advisor
      GUI launches, click the
      Show Result
      to open the report.
      If you do not see the
      Threading
      results, select the
      Threading
      perspective from the
      Analysis Workflow
      pane drop-down.
  2. In the Threading report, go to the
    Suitability
    report tab and examine how parallelization can improve the performance:
    • For the selected loop at
      nqueens_serial.cpp:154
      , the
      Intel Advisor
      predicts the performance speedup around 1.80x for default configuration parameters.
    • As the
      Scalability of Maximum Site Gain
      diagram shows, for CPU count from 2 to 16, the performance speedup increases. For the CPU count higher that 16, the performance speedup is the same, because the corresponding bull-eye dots are on the same line. Most of the dots on the diagram are located in the green zone, but from the 16 CPU, the higher the CPU count, the closer it is to the yellow zone. This means that the predicted speedup is worth an effort if you parallelize the loop for up to 16 CPUs. Parallelizing the loop to run on more than 16 CPUs might require more time and/or effort, but will result in the same speedup and might cause performance issues.
      See the Suitability report to experiment with various modeling parameters and estimate the speedup.
  3. Examine the three percentage metrics below the diagram. Notice that for the default CPU count of 8, the metrics are all green, which means that there are no performance issues. You are recommended to parallelize the loop for up to 8 CPUs to achieve optimal performance.
  4. Change the
    CPU Count
    to
    16
    to see the details about the predicted performance for this case. Notice that the corresponding dot is located closer to the yellow zone that the dots on the left from it. The
    Load Imbalance
    metric is yellow and is around 44%. The high load imbalance causes the predicted maximum speedup to be not enough to justify the effort needed to refactor your application. Consider investigating to understand how to optimize it.
  5. Experiment with the CPU count, threading model, and other parameters to see how they might affect the performance.
  6. Go to the
    Refinement
    reports tab to see if the annotated loops have dependencies that prevent parallelism.
    See the Refinement report to check if there are any loop-carried dependencies that might prevent the threading
    1. In the top pane of the
      Refinement Report
      , notice
      RAW
      (read after write),
      WAR
      (write after read), and
      WAW
      (write after write) dependencies in the loop in
      solve
      at
      nqueens_serial.cpp:154
      . T
    2. From the top pane, select the loop in
      solve
      at
      nqueens_serial.cpp:154
      .
    3. In the
      Problems and Messages
      pane, examine the dependency problems found in the loop in more details. Select one of the problems to see more information. For example, select the
      Read after write
      dependency.
    4. In the
      Code Locations
      pane, examine the source of the Read after write dependency: The instructions reference the
      nrOfSolutions
      variable as the
      Variable Reference
      column shows. This means that a
      race condition
      happens because multiple tasks may try to increment the same variable at the same time.
    You should fix the dependencies before applying threading to the application.

Next Steps

  1. Fix the dependencies found in the annotated loops. From the sample application source code, search for
    ADVISOR CORRECTNESS EDIT
    and follow the directions in the sample code to fix the problems (make six total edits).
  2. Rebuild the application and rerun the Threading perspective with the
    Medium
    accuracy (run the Survey, Trip Counts, Suitability, and Dependencies analyses).
  3. Make sure there are no dependencies found and your fixes did not negatively impact the predicted maximum speedup. Notice that the predicted speedup is higher and the load imbalance is green and does not impact the estimated performance anymore for the CPU count up to 8.
  4. When you decide the predicted maximum speedup benefit is worth the effort to add parallelism to your target, replace annotations with parallel framework code.
    This sample application already has the versions with replaced annotations with parallel framework code. Examine the following files:
    Parallel Framework
    File
    Intel® Cilk™ Plus
    3_nqueens_cilk.cpp
    OpenMP*
    3_nqueens_omp.cpp
    Intel® Threading Building Blocks (Intel® TBB)
    3_nqueens_tbb.cpp
  5. Build the parallel version of the sample.
  6. Test the resulting parallel application for correctness and verify its actual parallel performance using other
    Intel Advisor
    perspectives
    , the Intel® Inspector, and Intel® VTune™ Profiler.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.