Technology & Research

Intel® Technology Journal Home

Volume 12, Issue 03

Original 45nm Intel® Core™ Microarchitecture


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1203.03

  • Volume 12
  • Issue 03
  • Published November 7, 2008

Original 45nm Intel® Core™ Microarchitecture

  Section 7 of 15  

Improvements in the Intel® Core™2 Processor Family Architecture and Microarchitecture

COLLISION DETECTION

The dot product can be used for collision detection in games. This is another example of using the DPPS instruction in an AOS layout. In this example, the DPPS instruction is used to speed up collision detection of two spheres. To explain collision detection, consider two circles. The circles are said to have collided if the sum of radii is greater than or equal to the distance between the centers ((Figure 5) ). This same equation applies to spheres, except in that case there is a z-axis that contributes to the distance formula.



Figure 5: Circlesspheres collide if the sum of the radii is greater than or equal to the distance between the two centers

Two spheres collide if

The distance between two centers < = sum of radii
sqrt((x1 – x2) 2 +(y1 – y2) 2 +(z1 – z2) 2 ) < = r1 + r2

(x1 – x2) 2 +(y1 – y2) 2 +(z1 – z2) 2 < = (r1 + r2) 2

Dot Product (point A, point B) < = (r1 + r2) 2
As an example, imagine a fast-moving, hot fire particle about to incinerate other objects particles. Collision detection can be used to find out which of these objects comes in contact with the fire particle and will need to be set on fire (see (Figure 6) ).



Figure 6: Collision detection example: one fast-moving fire particle is about to slam into and incinerate other objects/particles

The C code implementation (Code 4) uses Eq. 1 to detect sphere collision between the fire particle and a thousand other particles.

struct VEC3

{

float x,y,z;

float dot() const

{return x*x + Y*Y * z*z;}

};

VEC3 operator –

(const VEC3 a, const VEC3 b)

{

VEC3 res;

res.x = a.x - b.x;

res.y = a.y - b.y;

res.z = a.z - b.z;

return res;

}

struct SPHERE

{

VEC3 center;

float rad;

};

void sphere_collision_C

(SPHERE *sp1,SPHERE *sp,int *coll)

{

for(int i = 0; i<gNumSpheres;i++)

{

float center_distance_squared =

(sp1[i].center-sp->center).dot();

float radii_sum_squared =

(sp1[i].rad + sp->rad)*

(sp1[i].rad + sp->rad);

if(center_distance_squared <

radii_sum_squared)

{

//which spheres collided

coll[i]++;

}

}

}              <Code 4>

The collision detection code can be optimized with SSE4.1 instructions. One single DPPS instruction can be used to make the three different calculations in Eq. 1: the dot product of AB: (x1-x2) 2 +(y1−y2) 2 +(z1−z2) 2 , the radii sum: (r1+r2) 2 , and the subtraction of the radii sum from the dot product.

          

Collision occurred if res's sign-bit is set (Code 5).

int sphere_collision_intrinsics (SPHERE *sp1,SPHERE *sp,int *coll)

{

int res;

__declspec(align(16))

static long _mask[4] =

{0,0,0,0x80000000};

__m128 s,s1,s2;

__m128 _mask128 =

*(__m128*)_mask;

s = _mm_load_ps((float *)sp);

//set sign bit to add: r1 - (-r2)

s = _mm_xor_ps(s,_mask128);

for(int i = 0;i<gNumSpheres;i++)

{

s1 = _mm_load_ps((float *)sp1);

s1 = _mm_sub_ps(s1,s);

//set sign bit on radii sum

s2 = _mm_xor_ps(s1,_mask128);

//s1[0-31] =

//center_distance_squared -

//radii_sum_squared

s1 = _mm_dp_ps(s1,s2,0xff);

//get sign bit of subtraction

res = _mm_extract_ps(s1,0);

res >> = 31;

//coll if radii_sum_squared

//> center_distance_squared

res & = 1;

coll[i] + = res;

}

}            <Code 5>

To use a single DPPS instruction to do all of this, we had to use a few SIMD tricks. First, we laid out the data so x,y,z,r of the sphere could be loaded with one aligned load. Then we used a mask to modify the sign bit of one of the radii before the packed subtract. We did this so that the subtract operation actually causes an addition, (r1-(-r2)). Then we used the mask again to modify the sign bit of one of the radii sums before the DPPS instruction. This causes the radii sum squared to be subtracted from the dot product of A and B.

These are the contents of the __m128 variables before the DPPS instruction:

//sign bit set on upper 32-bytes of _m128

s2 = [-(r1+r2), z1-z2, y1-y2, x1-x2]

s1 = [r1+r2, z1-z2, y1-y2, x1-x2]

These are the contents of s1[0-31] after the DPPS instruction: If s1[0–31] is negative, then the radii sum squared is greater than the dot product of A and B, and a collision occurred.

Another SSE4.1 instruction, EXTRACTPS, is used to extract the single precision floating-point value from an XMM register to a GP register to check if the sign-bit is set. The EXTRACTPS instruction removes the branch and enables GP registers to be used to do data manipulations. Both the DPPS and EXTRACTPS instructions provided the 1.5x speedup over C as shown in Table 2.


The analogous SSE3 solution has to use MULPS+HADDPS+MOVAPS+PSRLQ+ADDSS instructions as shown in Code 2 for the dot product. It also has to move the result to a floating-point register to do the comparison similar to the C-code. The SSE3 implementation has too large of a latency to provide a speedup over C.

In this section we provided two SSE4.1 examples. We discussed the measured performance of the streaming loads and provided the recommended usage scenario. We also discussed the DPPS instruction and the recommended usage scenarios for this instruction as demonstrated in the collision detection example.

  Section 7 of 15  

Back to Top

In this article

Download a PDF of this article.