Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Cacheability Support Intrinsics

The prototypes for Intel® Streaming SIMD Extensions 2 (Intel® SSE2) intrinsics for cacheability support are in the emmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

Intrinsic Name

Operation

Corresponding
Intel® SSE2 Instruction

_mm_stream_pd

Store

MOVNTPD

_mm256_stream_pd

Store

VMOVNTPD

_mm_stream_si128

Store

MOVNTDQ

_mm256_stream_si256

Store

VMOVNTDQ

_mm_stream_si32

Store

MOVNTI

_mm_stream_si64*

Store

MOVNTI

_mm_clflush

Flush

CLFLUSH

_mm_clflushopt

Flush

CLFLUSHOPT

_mm_lfence

Guarantee visibility

LFENCE

_mm_mfence

Guarantee visibility

MFENCE

_mm_stream_pd

void _mm_stream_pd(double *p, __m128d a);

Stores the data in a to the address p without polluting caches. The address p must be 16-byte (128-bit version) aligned. If the cache line containing address p is already in the cache, the cache will be updated. p[0] := a0 p[1] := a1

p[0]

p[1]

a0

a1

_mm256_stream_pd

void _mm256_stream_pd(double *p, __m256d a);

Stores the data in a to the address p without polluting caches. The address p must be 32-byte (VEX.256 encoded version) aligned. If the cache line containing address p is already in the cache, the cache will be updated. p[0] := a0 p[1] := a1

p[0]

p[1]

a0

a1

_mm_stream_si128

void _mm_stream_si128(__m128i *p, __m128i a);

Stores the data in a to the address p without polluting the caches. If the cache line containing address p is already in the cache, the cache will be updated. Address p must be 16-byte (128-bit version) aligned.

*p

a

_mm256_stream_si256

void _mm256_stream_si256(__m256i *p, __m256i a);

Stores the data in a to the address p without polluting the caches. If the cache line containing address p is already in the cache, the cache will be updated. Address p must be 32-byte (VEX.256 encoded version) aligned.

*p

a

_mm_stream_si32

void _mm_stream_si32(int *p, int a);

Stores the 32-bit integer data in a to the address p without polluting the caches. If the cache line containing address p is already in the cache, the cache will be updated.

*p

a

_mm_stream_si64

void _mm_stream_si64(__int64 *p, __int64 a);

Stores the 64-bit integer data in a to the address p without polluting the caches. If the cache line containing address p is already in the cache, the cache is updated.

*p

a

_mm_clflush

void _mm_clflush(void const*p);

Cache line containing p is flushed and invalidated from all caches in the coherency domain.

*p

a

_mm_clflushopt

void _mm_clflushopt(void const *p);

Cache line containing p is flushed and invalidated from all caches in the coherency domain. This optimized version of the _mm_clflush is available if indicated by the CPUID feature flag CLFLUSHOPT.

*p

a

_mm_lfence

void _mm_lfence(void);

Guarantees that every load instruction that precedes, in program order, the load fence instruction is globally visible before any load instruction which follows the fence in program order.

_mm_mfence

void _mm_mfence(void);

Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction which follows the fence in program order.