From Rice University
- Professor John Mellor-Crummey's talk at The International Conference for High Performance Computing, Networking, Storage and Analysis, Supercomputing 2014 (SC14). See PinPlay description on slides 6—8.
Introduction to Correctness and Performance Tools for Parallel Programming. Experiencing HPC for Undergraduates: Introduction to HPC Research. New Orleans, LA. November, 2014. - Milind Chabbi is a doctoral candidate advised by Professor John Mellor-Crummey in the Department of Computer Science at Rice University. He describes the use of PinPlay for finding a multithreaded bug (mentioned in Professor John Mellor-Crummey's talk preceding):
"I was developing a shared-memory synchronization algorithm, which was recursive in nature and involved complicated interactions between multiple threads via shared memory updates. The code ran into a livelock and the bug was neither apparent from inspecting the algorithm/code nor possible to isolate with traditional debugging techniques. The debugging was further complicated by the lack of reproducibility of the bug, the need for several threads to reproduce the bug, and run-to-run nondeterminism. Debugging tricks such as watchpoints, page protection, and assertions could only identify symptoms of the problem but failed to help in arriving at the root cause of the problem even for parallel programming experts.
Intel's PinPlay served as a savior by aiding in identifying the root cause—a data race. With PinPlay, I was able to run the code several times and record the log of a buggy execution. The deterministic replay feature of PinPlay for multithreaded codes, in conjunction with the powerful features of Pin framework to perform sophisticated analysis during execution replay, allowed me to break into the debugger just in time to notice step-by-step memory updates and thread interleaving that caused the data race. The cause of the data race was the following: The programmer assumed 64-bit cache-line-aligned memory writes to be atomically visible on x86_64 machines, whereas the compiler (GNU C++ v.4.4.5) took liberty to split a 64-bit write of an immediate value into two independent 32-bit writes, violating the programmer's assumption. This caused a small execution window of two instructions where a shared variable was in an inconsistent state, leading to the occasional data race and eventual livelock.
Like Intel's Pin framework, PinPlay is also robust and works on real code on real machines, making it my choice for debugging parallel programs. I would most certainly recommend PinPlay to both novice and expert programmers to debug their code that exhibit nondeterminism. In fact, we plan to introduce PinPlay in one of the advanced multicore programming classes here at Rice University."
The issue was as follows:My C++ source code: // L->flags is a 64-bit, cache aligned value L->flags = 0xdffffffffffffffd; // expected atomic write g++ generated assembly on 64-bit machine: movl $0xfffffffd,(%rax) // lower 32-bit update movl $0xdfffffff,0x4(%rax) // Higher 32-bit update
The developer expected that the write to L->flags would be atomic because it is a 64-bit, cache-aligned value. However, the compiler generated two 32-bit stores, which is a legal but questionable choice. To prevent this unfortunate mismatch of expectations and implementation, the code should have been written to declare L->flags as a C++ atomic and then used an atomic store to write L->flags.
Submit Your Testimonial
Please send your PinPlay and DrDebug experience to harish.patil@intel.com.