Transactional Memory Support: the speculative_spin_rw_mutex (Community Preview Feature)

author-image

By

In a previous post I discussed the Intel® Transactional Synchronization Extensions (Intel® TSX) technology released in the new generation of processors. I described the Intel® Threading Building Blocks (Intel® TBB) implementation of the HLE interface (speculative_spin_mutex).

Now we can talk about the implementation of speculative_spin_rtw_mutex, a Preview Feature of TBB 4.2 Update 2. speculative_spin_rtw_mutex uses RTM for mutual exclusion, and allows both concurrent reads and concurrent writes.

It also contains a spin_rw_mutex because it may be necessary to perform the operation without speculation.

If concurrent executions of code protected by the mutex do not conflict, all reads complete and writes are atomically committed without explicitly taking the lock.

If there is a conflict or another problem that prevents speculative execution, and the transaction is aborted, speculative_spin_rtw_mutex may retry the transaction or it may take the lock for real. If a writer takes the lock for real, all speculative readers and writers will abort the transaction and wait for the writer to complete, at which time the transactions may be retried.

If a reader takes the lock for real all speculative writers will abort the transaction and wait for the reader to release the lock, at which time the transactions may be retried. Real readers and speculative readers may proceed in parallel. All this happens “under the covers”, as part of the TBB implementation.

The reason the speculative_spin_rtw_mutex has to contain a regular spin_rw_mutex is because there are no completion guarantees with RTM. The code being protected by the mutex may have operations (such as system calls) that cannot be completed in a transaction.

There are also limits to the number of cache lines that can be accessed or modified in a transaction, and if that limit is reached the transaction cannot complete.

The speculative_spin_rtw_mutex guarantees forward progress by limiting the number of times a transaction is tried, and performing a non-speculative lock if necessary.

In the last post I mentioned that a speculative lock requires a “fallback path”, a code path that can be executed when the transaction fails.

The speculative_spin_rtw_mutex is designed such that the same code is used for both the speculative and fallback path. This greatly simplifies its use.

If the transaction is aborted, RTM returns a code giving the reason for the abort. For instance, if a time-slice completes while a thread is executing speculatively, the transaction is aborted, but it may succeed if retried. If the return code indicates a retry may succeed, and if the maximum number of retries is not reached, the transaction will be re-attempted.

Several things to note about the mutex (some of which apply to speculative_spin_mutex also):
 

  • The mutex occupies three cache lines, because the spin_rw_mutex and the write flag in the mutex must be on separate cache lines, and because allocators do not guarantee allocations occur at the start of a cache line.
  • If the architecture does not support RTM, the speculative_spin_rtw_mutex will default to a spin_rw_mutex padded to guarantee it is on a separate cache line.
  • The class does not provide explicit methods to lock and unlock a mutex,  i.e. a program cannot define a speculative_spin_rw_mutex M and execute an M.lock(). The proper way to use a speculative_spin_rtw_mutex is to lock and unlock it with a scoped_lock:
    speculative_spin_rw_mutex m;
    {
        speculative_spin_rw_mutex::scoped_lock l(m);
        // code protected by mutex
    }
    // on exit from block the mutex is unlocked
    // by destructor for the scoped_lock

    This is because each thread must have local storage for thread state, and the scoped_lock on the stack contains that storage.
     
  • The speculative_spin_rtw_mutex differs from other implementations (such as the pthreads mutexes) in that under speculation a write lock may be obtained recursively.  Recursively acquiring the same lock in write mode does not deadlock unless it is not taken under speculation.  
  • The programming patterns that depend on recursive locks deadlocking are of special interest only; if you depend on this behavior, please don’t use a speculating mutex.
    Each implementation of Intel TSX has limits on how many levels of speculation are supported.  These limits may change from generation to generation.
     

Remember that not all the 4th Generation Intel® Core™ processors support transactional synchronization.  You should check ark.intel.com to verify that Intel TSX is available in the processor you are using. On any processors not supporting Intel TSX the speculative mutexes will behave as their non-speculating counterparts, with possibly-worse performance.

Careful performance measurement will help you decide if speculative_spin_rtw_mutex will help the scalability of your application.

For help optimizing your program with Intel TSX, you should consult the Intel® 64 and IA-32 Architectures Optimization Reference Manual, Chapter 12.