I. From Rice University
- Prof. John Mellor-Crummey's talk at Supercomputing 2014 (see PinPlay description on slides 6-8):
Introduction to Correctness and Performance Tools for Parallel Programming. The International Conference for High Performance Computing, Networking, Storage and Analysis (SC14). Experiencing HPC for Undergraduates: Introduction to HPC Research. New Orleans, LA. November, 2014. - Milind Chabbi is a doctoral candidate advised by Prof. John Mellor-Crummey in the department of computer science at Rice University. He describes the use of PinPlay for finding a multi-threaded bug (mentioned in Prof. John Mellor-Crummey's talk above):
"I was developing a shared-memory synchronization algorithm, which was recursive in nature and involved complicated interactions between multiple threads via shared memory updates. The code ran into a livelock and the bug was neither apparent from inspecting the algorithm/code nor possible to isolate with traditional debugging techniques. The debugging was further complicated by lack of reproducibility of the bug, need for several threads to reproduce the bug, and run-to-run non determinism. Debugging tricks such as watchpoints, page protection and assertions could only identify symptoms of the problem but failed to help in arriving at the root cause of the problem even for parallel programming experts.
Intel's PinPlay served as a savior by aiding in identifying the root cause--a data race. With PinPlay, I was able to run the code several times and record the log of a buggy execution. Deterministic replay feature of PinPlay for multi-threaded codes, in conjunction with powerful features of Pin framework to perform sophisticated analysis during execution replay, allowed me to break into the debugger just in time to notice step-by-step memory updates and thread interleaving that caused the data race. The cause of the data race was the following: the programmer assumed 64-bit cache-line-aligned memory writes to be atomically visible on x86_64 machines, whereas the compiler (GNU C++ v.4.4.5) took liberty to split a 64-bit write of an immediate value into two independent 32-bit writes, violating the programmer's assumption. This caused a small execution window of two instructions where a shared variable was in an inconsistent state, leading to the occasional data race and eventual livelock.
Like Intel's Pin framework, PinPlay is also robust and works on real code on real machines, making it my choice for debugging parallel programs. I would most certainly recommend PinPlay to both novice and expert programmers to debug their code that exhibit non determinism. In fact, we plan to introduce PinPlay in one of advanced multi-core programming classes here at Rice University."
If you are curious this was the issue:
My C++ source code:
// L->flags is a 64-bit, cache aligned value
L->flags = 0xdffffffffffffffd; // expected atomic write
g++ generated assembly on 64-bit machine:
movl $0xfffffffd,(%rax) // lower 32-bit update
movl $0xdfffffff,0x4(%rax) // Higher 32-bit update
The developer expected that the write to L->flags would be atomic because it is a 64-bit, cache-aligned value. However, the compiler generated two 32-bit stores, which is a legal though questionable choice. To prevent this unfortunate mismatch of expectations and implementation, the code should have been written to declare L->flags as a C++ atomic and then used an atomic store to write L->flags.
II. <your testimonials here>
Please send your PinPlay/DrDebug experience to harish.patil@intel.com.