The browser version you are using is not recommended for this site.Please consider upgrading to the latest version of your browser by clicking one of the following links.
We are sorry, This PDF is available in download format only
This paper presents a fast and efficient method of computing CRC on IA processors with generic polynomials using the carry-less multiplication instruction – PCLMULQDQ.
Instead of reducing the entire message with traditional reduction algorithms, we use a faster folding approach to reduce an arbitrary length buffer to a small fixed size to be reduced further by traditional methods such as Barrett reduction.
Parallelized folding approach is used to maximize the throughput of PCLMULQDQ instructions. We show how to do this efficiently for data buffers of arbitrary length.
The final reduction part is only slightly different for different sized polynomials (e.g., a 32-bit CRC and a 16-bit CRC). With our novel folding methods, CRC computation using PCLMULQDQ is faster than best software routines that don’t use the instruction, on a range of IA processor cores.
This paper will enable customers to code and optimize any CRC application for maximum performance on Westmere. We use real examples in the paper to illustrate the methods.
A discussion of experimental techniques for rapid mitigation of phishing and spam. (v.1, June 2009)
Describes how to increase throughput with network coding.
Maximize performance on a Windows*-based system using SMP and IntervalZero* RTX. (v.001, Oct. 2011)
Webinar details AltiVec* SIMD macros translator migration to Intel® processors. (v.1, May 2009)