Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Load Intrinsics

Intel® Streaming SIMD Extensions 2 (Intel® SSE2) intrinsics for floating-point load operations are listed in this topic. The prototypes for Intel® SSE2 intrinsics are in the emmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

The load and set operations are similar in that both initialize __m128d data. However, the set operations take a double argument and are intended for initialization with constants, while the load operations take a double pointer argument and are intended to mimic the instructions for loading data from memory.

The results of each intrinsic operation are placed in a register. The information about what is placed in each register appears in the tables below, in the detailed explanation for each intrinsic. For each intrinsic, the resulting register is represented by R0 and R1, where R0 and R1 each represent one piece of the result register.

Intrinsic Name

Operation

Corresponding
Intel® SSE2 Instruction

_mm_load_pd

Loads two DP FP values

MOVAPD

_mm_load1_pd

Loads a single DP FP value, copying to both elements

MOVSD + shuffling

_mm_loadr_pd

Loads two DP FP values in reverse order

MOVAPD + shuffling

_mm_loadu_pd

Loads two DP FP values

MOVUPD

_mm_load_sd

Loads a DP FP value, sets upper DP FP to zero

MOVSD

_mm_loadh_pd

Loads a DP FP value as the upper DP FP value of the result

MOVHPD

_mm_loadl_pd

Loads a DP FP value as the lower DP FP value of the result

MOVLPD

_mm_load_pd

__m128d _mm_load_pd(double const*dp);

Loads two DP FP values. The address p must be 16-byte aligned.

R0

R1

p[0]

p[1]

_mm_load1_pd

__m128d _mm_load1_pd(double const*dp);

Loads a single DP FP value, copying to both elements. The address p need not be 16-byte aligned.

R0

R1

*p

*p

_mm_loadr_pd

__m128d _mm_loadr_pd(double const*dp);

Loads two DP FP values in reverse order. The address p must be 16-byte aligned.

R0

R1

p[1]

p[0]

_mm_loadu_pd

__m128d _mm_loadu_pd(double const*dp);

Loads two DP FP values. The address p need not be 16-byte aligned.

R0

R1

p[0]

p[1]

_mm_load_sd

__m128d _mm_load_sd(double const*dp);

Loads a DP FP value. The upper DP FP is set to zero. The address p need not be 16-byte aligned.

R0

R1

*p

0.0

_mm_loadh_pd

__m128d _mm_loadh_pd(__m128d a, double const*dp);

Loads a DP FP value as the upper DP FP value of the result. The lower DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0

R1

a0

*p

_mm_loadl_pd

__m128d _mm_loadl_pd(__m128d a, double const*dp);

Loads a DP FP value as the lower DP FP value of the result. The upper DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0

R1

*p

a1