Development Reference Guides

Contents

Intel® Compiler Extension Routines to OpenMP*

The Intel® compiler implements the following group of routines as extensions to the OpenMP* run-time library:
  • Get and set the execution environment
  • Get and set the stack size for parallel threads
  • Memory allocation
  • Get and set the thread sleep time for the throughput execution mode
  • Target memory allocation
The Intel® extension routines described in this section can be used for low-level tuning to verify that the library code and application are functioning as intended. These routines are generally not recognized by other OpenMP-compliant compilers, which may cause the link stage to fail in the other compiler. To execute these OpenMP routines, use the
/Qopenmp-stubs
(Windows*) or
-qopenmp-stubs
(Linux*) option.
In most cases, environment variables can be used in place of the extension library routines. For example, the stack size of the parallel threads may be set using the
OMP_STACKSIZE
environment variable rather than the
kmp_set_stacksize_s()
library routine.
A run-time call to an Intel extension routine takes precedence over the corresponding environment variable setting.

Execution Environment

Function
Description
void kmp_set_defaults(char const *)
Sets OpenMP environment variables defined as a list of variables separated by "
|
" in the argument.
void kmp_set_library_throughput(void)
Sets execution mode to throughput, which is the default. Allows the application to determine the runtime environment. Use in multi-user environments.
void kmp_set_library_turnaround(void)
Sets execution mode to turnaround. Use in dedicated parallel (single user) environments.
void kmp_set_library_serial(void)
Sets execution mode to serial.
void kmp_set_library(int)
Sets execution mode indicated by the value passed to the function. Valid values are:
  • 1
    - serial mode
  • 2
    - turnaround mode
  • 3
    - throughput mode
Call this routine before the first parallel region is executed.
int kmp_get_library(void)
Returns a value corresponding to the current execution mode:
  • 1
    - serial
  • 2
    - turnaround
  • 3
    - throughput

Stack Size

Function
Description
size_t kmp_get_stacksize_s(void)
Returns the number of bytes that will be allocated for each parallel thread to use as its private stack. This value can be changed with
kmp_set_stacksize_s()
routine, prior to the first parallel region or via the
KMP_STACKSIZE
environment variable.
int kmp_get_stacksize(void)
Provided for backwards compatibility only. Use
kmp_get_stacksize_s()
routine for compatibility across different families of Intel processors.
void kmp_set_stacksize_s(size_t
size
)
Sets to
size
the number of bytes that will be allocated for each parallel thread to use as its private stack. This value can also be set via the
KMP_STACKSIZE
environment variable. In order for
kmp_set_stacksize_s()
to have an effect, it must be called before the beginning of the first (dynamically executed) parallel region in the program.
void kmp_set_stacksize(int
size
)
Provided for backward compatibility only. Use
kmp_set_stacksize_s()
for compatibility across different families of Intel® processors.

Memory Allocation

The Intel® compiler implements a group of memory allocation routines as an extension to the OpenMP run-time library to enable threads to allocate memory from a heap local to each thread. These routines are:
kmp_malloc()
,
kmp_calloc()
, and
kmp_realloc()
.
The memory allocated by these routines must also be freed by the
kmp_free()
routine. While you can allocate memory in one thread and then free that memory in a different thread, this mode of operation incurs a slight performance penalty.
Function
Description
void* kmp_malloc(size_t
size
)
Allocate memory block of
size
bytes from thread-local heap.
void* kmp_calloc(size_t
nelem
, size_t
elsize
)
Allocate array of
nelem
elements of size
elsize
from thread-local heap.
void* kmp_realloc(void*
ptr
, size_t
size
)
Reallocate memory block at address
ptr
and
size
bytes from thread-local heap.
void* kmp_free(void*
ptr
)
Free memory block at address
ptr
from thread-local heap.
Memory must have been previously allocated with
kmp_malloc()
,
kmp_calloc()
, or
kmp_realloc()
.

Thread Sleep Time

In the throughput OpenMP* Support Libraries, threads wait for new parallel work at the ends of parallel regions, and then sleep, after a specified period of time. This time interval can be set by the
KMP_BLOCKTIME
environment variable or by the
kmp_set_blocktime()
function.
Function
Description
int kmp_get_blocktime(void)
Returns the number of milliseconds that a thread should wait, after completing the execution of a parallel region, before sleeping, as set either by the
KMP_BLOCKTIME
environment variable or by
kmp_set_blocktime()
.
void kmp_set_blocktime(int
msec
)
Sets the number of milliseconds that a thread should wait, after completing the execution of a parallel region, before sleeping. This routine affects the block time setting for the calling thread and any OpenMP team threads formed by the calling thread. The routine does not affect the block time for any other threads.

Target Memory Allocation

Function
Description
void *omp_target_alloc_host(size_t size, int device_num)
Returns the address of a storage location that is
size
bytes in length allocated in host memory. The same pointer may be used to access the memory on the host and all supported devices. If the allocation request fails, a null pointer is returned.
void *omp_target_alloc_device(size_t size, int device_num)
Returns the address of a storage allocation that is
size
bytes in length. Device allocations are owned by the device specified by
device_num
in device memory if present. Generally, the allocation can be accessed only by the device, but may be copied to other device or host allocated memory. A null pointer return value indicates the allocation was not successful.
void *omp_target_alloc_shared(size_t size, int device_num)
Returns the address of a storage allocation that is
size
bytes in length. The same pointer may be used to access the memory on the host and the specified device. Shared allocations are shared by the host and the specified device, and are intended to migrate between the host and the device. A null pointer is returned if the allocation is unsuccessful.
void *ompx_target_realloc(void *ptr, size_t size, int device_num)
Deallocates the device memory specified with
ptr
and allocates a new device memory with the specified size in bytes for the given device
device_num
. The returned memory can be accessed only by the specified device. The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and
size
argument.
void *ompx_target_realloc_host(void *ptr, size_t size, int device_num)
Deallocates the device memory specified with
ptr
and allocates a new device memory with the specified size in bytes for the given device
device_num
. The returned memory can be accessed by the host and all supported devices. The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and
size
argument.
void *ompx_target_realloc_device(void *ptr, size_t size, int device_num)
Deallocates the device memory specified with
ptr
and allocates a new device memory with the specified size in bytes for the given device
device_num
. The returned memory can be accessed only by the specified device. The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and
size
argument.
void *ompx_target_realloc_shared(void *ptr, size_t size, int device_num)
Deallocates the device memory specified with
ptr
and allocates a new device memory with the specified size in bytes for the given device
device_num
. The returned memory can be accessed by the host and the specified device. The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and
size
argument.
void *ompx_target_aligned_alloc(size_t alignment, size_t size, int device_num)
Allocates device memory that is aligned to the specified alignment argument
align
for the specified device
device_num
. The returned memory can be accessed only by the specified device.
void *ompx_target_aligned_alloc_host(size_t alignment, size_t size, int device_num)
Allocates device memory that is aligned to the specified alignment argument
align
for the specified device
device_num
. The returned memory can be accessed by the host and all supported devices.
void *ompx_target_aligned_alloc_device(size_t alignment, size_t size, int device_num)
Allocates device memory that is aligned to the specified alignment argument
align
for the specified device
device_num
. The returned memory can be accessed only by the specified device.
void *ompx_target_aligned_alloc_shared(size_t alignment, size_t size, int device_num)
Allocates device memory that is aligned to the specified alignment argument
align
for the specified device
device_num
. The returned memory can be accessed by the host and the specified device.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.