When targeting x64 platforms in Visual Studio .NET* 2005, programmers are no longer able to use inline assembly code as they did for 32-bit code. This forces the programmer to either rely on C/C++ code using intrinsics, or to tediously create a 64-bit MASM (.asm) version of the function. Unfortunately, the VS .Net 2005 implementation of the intrinsic for CPUID (__cpuid) recognizes only input arguments in the register eax, and not the more recently defined inputs in ecx, which are required for queries regarding cache parameters and certain multi-core characteristics. Thus, a 64-bit .asm listing is required for full use of the CPUID instruction.
The following code samples demonstrate how to use the CPUID and RDTSC instructions with VS .Net 2005 for 64-bit (x64) platforms. The CPUID instruction is commonly used to obtain detailed information about the system’s CPU(s), and RDTSC is used to read the CPU’s internal time-stamp counter for timing and performance-measurement purposes. The RDTSC intrinsic (__rdtsc) does work as expected and can be used to replace inline assembly.
To build the 64-bit .asm file, create a custom build step that calls the 64-bit MASM, "ml64.exe", as shown in the screen-shot below. For the 32-bit configuration, the cpuid64.asm file should not be built, so for platform Win32, set General -> Excluded From Build to Yes.

(Click image for larger version)
The header file below (cpuid_32_64.h) creates a single definition of the functions _CPUID and _RDTSC that can be used in both 32-bit and 64-bit builds. For 64-bit builds, _CPUID uses the .asm function cpuid64, and _RDTSC uses the intrinsic __rdtsc. For 32-bit builds, _CPUID uses the inline-assembly function cpuid32, and _RDTSC uses the inline-assembly function _inl_rdtsc32.
There are two examples shown in the C file below (cpuid_32_64.c). The first is GetCoresPerPackage(), which calls _CPUID with eax=4 and ecx=0 in order to read the first set deterministic cache parameters reported by the CPU and extract the field indicating the number of processor cores per processor package. (For example, this function would return 1 for a single-core Intel® Pentium® 4 processor, and 2 for a dual-core Intel® Pentium® D processor.) If the intrinsic __cpuid were used in this function on an x64 platform instead of the cpuid64 function, the input value of ecx would be nondeterministic, and the output would be unreliable. The second example function is timeSomethingExample(), which calls _RDTSC twice and calculates the elapsed timer ticks in the loop. The _CPUID example shows how to use one definition to invoke either 64-bit .asm code or 32-bit inline assembly, and the _RDTSC example shows how to use one definition to invoke either a 64-bit intrinsic or 32-bit inline assembly.
Both the _CPUID and _RDTSC examples show how to create utility functions that are transparently portable from Win32 to x64 platforms in cases where different underlying code is required for each platform. Furthermore, the cpuid64 function provides a workaround for a deficiency in the __cpuid intrinsic, allowing both 32-bit and 64-bit applications to fully utilize the capability of the CPUID instruction.
Header file (cpuid_32_64.h):