## Introduction

Recursive helping is a technique for implementing lock-free concurrent data structures and algorithms. I’m going to illustrate this in the case of implementing a multi-variable compare-and-swap (MCAS) in terms of a single variable compare-and-swap. Basically everything I’m going to talk about comes from Keir Fraser’s PhD Thesis, Practical Lock-Freedom (2004) which I strongly recommend. Fraser’s thesis goes much further than this, e.g. fine-grained lock-free implementations of software transactional memory (STM)1. Fraser went on to contribute to the initial implementations of STM in Haskell, though his thesis uses C++.

First, some prerequisites.

## Terms

I imagine most developers when they hear the term “lock-free” take it to mean a concurrent algorithm implemented without using locks. It, however, has a technical definition. Assuming a concurrent application is functionally correct, e.g. it does the right thing if it terminates no matter how things are scheduled, we still have three liveness problems in decreasing order of severity:

• Deadlock - the application getting stuck in state where no subprocesses can be scheduled
• Livelock - the application fails to make progress despite subprocesses being scheduled, e.g. endlessly retrying
• Starvation - some subprocesses never make progress even though the application as a whole makes progress

In parallel, we have three properties corresponding to programs that cannot exhibit the above behaviors:

• Obstruction-free - no deadlock, you’ll get this if you don’t use locks
• Lock-free - obstruction-free and no livelock
• Wait-free - lock-free and no starvation

Wait-freedom is the most desirable property but was difficult to achieve with reasonable efficiency. However, relatively recent techniques (2014) may do for wait-free algorithms what Fraser’s thesis did for lock-free algorithms, namely reduce a research problem to an exercise. These techniques, however, still start from a lock-free algorithm. Obstruction-freedom is usually what you get from concurrency control mechanisms that abort and retry in the case of conflicts. To achieve lock-freedom, we need to avoid losing and redoing work, or at least doing so repeatedly indefinitely.

See Fraser’s thesis for more formal definitions.

I’ll use “lockless” to mean an algorithm implemented without using locks.

## Compare-and-Swap

The usual primitive used to implement and describe lockless algorithms is compare-and-swap, often just called `cas`. There are other possibilities, but `cas` is universal, relatively simple, and widely implemented, e.g. as the `cmpxchg` operation for the x86 architecture. The following is a specification of a `cas` operation in Haskell with the additional note that this intended to be performed atomically (which it would not be in Haskell).

``````cas :: (Eq a) => IORef a -> a -> a -> IO a
cas ref old new = do
if curr == old then do
writeIORef ref new
return curr
else do
return curr``````

## Specification of Multiple Compare-and-Swap

The specification of multiple compare-and-swap is the straightforward extension to the above to several variables.

``````mcasSpec :: (Eq a) => [(IORef a, a, a)] -> IO Bool
mcasSpec entries = do
eqs <- forM entries $\(ref, old, _) -> do curr <- readIORef ref return (curr == old) if and eqs then do -- if all equal forM_ entries$ \(ref, _, new) -> do
writeIORef ref new
return True
else do
return False``````

The above is, again, intended to be executed atomically. It will be convenient to allow a bit more flexibility in the type producing the type:

``mcas :: (Eq a) => [(MCASRef a, a, a)] -> IO Bool``

where we have

``````-- Abstract.
newtype MCASRef a = MCASRef { unMCASRef :: IORef (Either (T a) a) } deriving ( Eq )

newMCASRef :: a -> IO (MCASRef a)
newMCASRef v = MCASRef <$> newIORef (Right v) readMCASRef :: MCASRef a -> IO a -- Will be implemented below.`````` The idea here is that, in addition to values of type `a`, we can also store values of type `T a` into the pointers for internal use, and we can unambiguously distinguish them from values of type `a`. `T a` can be any type constructor you like. In the code below, I will assume `IORef`s have an `Ord` instance, i.e. that they can be sorted. This is not true, but an approach as in ioref-stable could be used to accomplish this. Alternatively, `Ptr`s to `StablePtr`s could be used. We won’t worry about memory consistency concerns here. That is, we’ll assume sequential consistency where all CPU cores see all updates immediately. I recommend stopping here and thinking about how you would implement `mcas` in terms of `cas` while, of course, achieving the desired atomicity. The solution I’ll present – the one from Fraser’s thesis – is moderately involved, so if the approach you come up with is very simple, then you’ve probably made a mistake. Unsurprisingly, the solution I’ll present makes use of the additional flexibility in the type and the ability to sort `IORef`s. I’m not claiming it is impossible to accomplish this without these though. For example, you could apply a universal construction which witnesses the universality of `cas`. ## Recursive Helping Lockless algorithms typically proceed by attempting an operation and detecting conflicts, e.g. as in multi-version concurrency control. This requires storing enough information to tell that a conflicting operation has occurred/is occurring. Once a conflict is detected, the simplest solution is typically to abort and retry hoping that there isn’t a conflict the next time. This clearly leads to the possibility of livelock. Instead of aborting, the later invocation of the operation could instead help the earlier one to complete, thereby getting it out of its way. This ensures that the first invocation will always, eventually complete giving us lock-freedom. However, this doesn’t guarantee that once the second invocation finishes helping the first invocation that a third invocation won’t jump in before the second invocation gets a chance to start on its own work. In this case, the second invocation will help the third invocation to complete before attempting to start itself. A process can end up spending all its time helping other processes while never getting its own work done, leading to starvation. To perform recursive helping, we need an invocation to store enough information so that subsequent, overlapping invocations are able to assist. To accomplish this, we’ll store a (pointer to a) “descriptor” containing the parameters of the invocation being helped and potentially additional information2. This is what we’ll use for the `T` type constructor. The general approach will be: at the beginning of the operation we will attempt to “install” a descriptor in the first field we touch utilizing `cas`. There are then three possible outcomes. If we fail and find a value, then the operation has failed. If we fail and find an existing descriptor, then we (potentially recursively) help that descriptor. If we succeed, then we have successfully “acquired” the field and we “help” ourselves. We can have many processes all trying to help the same invocation at the same time, so it is still important that multiple identical help calls don’t interfere with each other. Just because we’re helping an invocation of an operation doesn’t mean that that the original process isn’t still executing. Since we’ll be replacing pointers to values with pointers to descriptors, reading the value becomes non-trivial. In particular, if when we read a pointer we get a descriptor, we’ll need to help the invocation described to completion. We will need to keep doing this until we successfully read a value. ## Conditional Compare-and-Swap An operation we’ll use in the implementations of `mcas` is a conditional compare-and-swap (CCAS) where we only perform the swap if, additionally, an additional variable is set to a given value. It has the following specification. ``````-- Specification. Implementations should perform this atomically. ccasSpec :: (Eq a, Eq c) => IORef a -> a -> a -> IORef c -> c -> IO () ccasSpec ref old new condRef check = do curr <- readIORef ref cond <- readIORef condRef if cond == check && curr == old then do writeIORef ref new else do return ()`````` We’ll need to show that this can be implemented in terms of `cas`, or rather a version with modifications similar to those mentioned for `mcas`. This will be a simple instance of the recursive helping approach that will be applied in the implementation of `mcas`. ``````type CCASDescriptor a c = IORef (CCASRef a c, a, a, IORef c, c) newtype CCASRef a c = CCASRef { unCCASRef :: IORef (Either (CCASDescriptor a c) a) } deriving ( Eq, Ord )`````` We begin with the types. As described above, a `CCASRef` is just an `IORef` that holds either a value or a descriptor, and the descriptor is just an `IORef` pointing at a tuple holding the arguments to `ccas`. We won’t actually modify this latter `IORef` and instead are just using it for its object identity. It could be replaced with a read-only `IVar` or a `Unique` could be allocated and used as an identifier instead. In a lower-level language, this `IORef` corresponds to having a pointer to the descriptor. ``````newCCASRef :: a -> IO (CCASRef a c) newCCASRef v = CCASRef <$> newIORef (Right v)

readCCASRef :: (Eq a, Eq c) => CCASRef a c -> IO a
case x of
Right v -> return v
Left d -> do
ccasHelp d

-- Not atomic. This CAS can fail even when it would be impossible if `ccas` was truly atomic.
-- Example: ccas a reference to the same value but where the condRef is False. The ccas fails
-- and thus should behave as a no-op, but if a `casCCASRef` occurs during the course of the ccas,
-- the `casCCASRef` can fail even though it should succeed in all cases.
casCCASRef :: (Eq a) => CCASRef a c -> a -> a -> IO Bool
casCCASRef (CCASRef ref) old new = do
curr <- cas ref (Right old) (Right new)
return (curr == Right old)

tryReadCCASRef :: CCASRef a c -> IO (Maybe a)
return (case x of Left _ -> Nothing; Right v -> Just v)``````

To get them out of the way, the following functions implement the reference-like aspects of a `CCASRef`. The descriptor is an internal implementation detail. The interface is meant to look like a normal reference to a value of type `a`. The main notes are:

• since the `CCASRef` may not contain a value when we read, we loop helping to complete the `ccas` until it does,
• `casCCASRef` is a slightly simplified `cas` used in`mcas` but should be not be considered part of the interface, and
• `tryReadCCASRef` is used in the implementation of `mcas`, but you quite possibly wouldn’t provide it otherwise.
``````ccas :: (Eq a, Eq c) => CCASRef a c -> a -> a -> IORef c -> c -> IO ()
ccas ref old new condRef check = do
d <- newIORef (ref, old, new, condRef, check)
v <- cas (unCCASRef ref) (Right old) (Left d)
go d v
where go d (Left d') = do -- descriptor already there
ccasHelp d'
v <- cas (unCCASRef ref) (Right old) (Left d)
go d v
go d (Right curr) $curr == old = ccasHelp d -- we succeeded$ otherwise = return ()   -- we failed

ccasHelp :: (Eq a, Eq c) => CCASDescriptor a c -> IO ()
ccasHelp d = do
(CCASRef ref, old, new, condRef, check) <- readIORef d
_ <- cas ref (Left d) (Right $! if cond == check then new else old) return ()`````` Here we illustrate the (not so recursive) helping pattern. `ccas` allocates a descriptor and then attempts to “acquire” the reference. There are three possibilities. 1. We find a descriptor already there, in which case we help it and then try to acquire the reference again. 2. The CAS succeeds and thus we successfully “acquire” the reference. We then “help ourselves”. 3. The CAS fails with an unexpected (non-descriptor) value. Thus, the CCAS fails and we do nothing. Helping, implemented by `ccasHelp`, just performs the logic of CCAS. If we’ve gotten to `ccasHelp`, we know the invocation described by the descriptor did, in fact, find the expected value there. By installing our descriptor, we’ve effectively “locked out” any other calls to `ccas` until we complete. We can thus check the `condRef` at our leisure. As long as our descriptor is still in the `CCASRef`, which we check via a `cas`, we know that there have been no intervening operations, including other processes completing this `ccas`. `ccasHelp` is idempotent in the sense that running it multiple times, even in parallel, with the same descriptor is the same as running it once. This is due to the fact that we only (successfully) CAS in the descriptor once, so we can only CAS it out at most once. ## Multiple Compare-and-Swap The setup for MCAS is much the same as CCAS. The main additional complexity comes from the fact that we need to simultaneously “acquire” multiple references. This is handled by a two-phase approach. In the first phase, we attempt to “acquire” each reference. We proceed to the second phase once we’ve either seen that the MCAS is going to fail, or we have successfully “acquired” each reference. In the second phase, we either reset all the “acquired” references to their old values if the MCAS failed or to their new values if it succeeded. The MCAS will be considered to have occurred atomically at the point we record this decision, via a CAS, i.e. between the two phases. ``````data MCASStatus = UNDECIDED $FAILED$ SUCCESSFUL deriving ( Eq ) data MCASDescriptor' a = MCASDescriptor [(MCASRef a, a, a)] (IORef MCASStatus) type MCASDescriptor a = IORef (MCASDescriptor' a) newtype MCASRef a = MCASRef { unMCASRef :: CCASRef (Either (MCASDescriptor a) a) MCASStatus } deriving ( Eq, Ord )`````` As with CCAS, an `MCASRef` is a reference, in this case a `CCASRef`, that either holds a value or a descriptor. The descriptor holds the arguments of `mcas`, as with `ccas`, but it additionally holds a status reference. This status reference will be used as the condition reference of the CCAS. In particular, as we’ll see, we will only perform `ccas`’s when the status is `UNDECIDED`. ``````newMCASRef :: a -> IO (MCASRef a) newMCASRef v = MCASRef <$> newCCASRef (Right v)

readMCASRef :: (Eq a) => MCASRef a -> IO a
case x of
Right v -> return v
Left d -> do
mcasHelp d

There’s nothing to say about the reference interface functions. They are essentially identical to the CCAS ones for the same reasons only with `CCASRef`s instead of `IORef`s.

``````mcas :: (Eq a) => [(MCASRef a, a, a)] -> IO Bool
mcas entries = do
status <- newIORef UNDECIDED
d <- newIORef (MCASDescriptor (sortOn (\(ref, _, _) -> ref) entries) status)
mcasHelp d``````

The `mcas` function is fairly straightforward. It allocates a status reference and a descriptor and delegates most of the work to `mcasHelp`. The main but critical subtlety is the sort. This is critical to ensuring termination.

``````mcasHelp :: (Eq a) => MCASDescriptor a -> IO Bool
mcasHelp d = do
MCASDescriptor entries statusRef <- readIORef d
let phase1 [] = do
_ <- cas statusRef UNDECIDED SUCCESSFUL
phase2
phase1 ((MCASRef ref, old, new):es) = tryAcquire ref old new es

tryAcquire ref old new es = do
_ <- ccas ref (Right old) (Left d) statusRef UNDECIDED
case v of
Just (Left d') $d == d' -> phase1 es -- successful acquisition$ otherwise -> do -- help someone else
mcasHelp d'
tryAcquire ref old new es
Just (Right curr) | curr == old -> do
if status == UNDECIDED then do
tryAcquire ref old new es -- failed to acquire but could still succeed
else do
phase2
_ -> do -- failed MCAS
_ <- cas statusRef UNDECIDED FAILED
phase2

phase2 = do
let succeeded = status == SUCCESSFUL
forM_ entries \$ \(MCASRef ref, old, new) -> do
casCCASRef ref (Left d) (Right (if succeeded then new else old))
return succeeded

phase1 entries ``````

`phase1` attempts to “acquire” each `MCASRef` by using `tryAcquire` which will move `phase1` to the next entry each time it succeeds. Therefore, if `phase1` reaches the end of the list and there was no interference, we will have successfully “acquired” all references. We record this with a CAS against `statusRef`. If this CAS succeeds, then the MCAS will be considered successful and conceptually to have occurred at this point. If the CAS fails, then some other process has already completed this MCAS, possibly in success or failure. We then move to `phase2`.

`tryAcquire` can also detect that the MCAS should fail. In this case, we immediately attempt to record this fact via a CAS into `statusRef`. As with the successful case, this CAS succeeding marks the conceptual instant that the MCAS completes. As before, we then move on to `phase2`.

We never enter `phase2` without `statusRef` being set to either `SUCCESSFUL` or `FAILED`. `phase2` is completely straightforward. We simply set each “acquired” reference to either the new or old value depending on whether the MCAS succeeded or not. The `casCCASRef` will fail if either we never got around to “acquiring” a particular reference (in the case of MCAS failure), or if a reference was written to since it was “acquired”. Since such writes conceptually occurred after the MCAS completed, we do not want to overwrite them.

During `tryAcquire`, there are a few cases that lead to retrying. First, if we find that a reference has already been “acquired” by some other MCAS operation, we recursively help it. Here, the sorting of the references is important to ensure that any MCAS operation we help will never try to help us back. It’s easy to see that without the sorting, if two MCAS operations on the same two references each acquired one of the references, the (concurrent) recursive calls to `mcasHelp` would become infinite loops. With a total ordering on references, each recursive call to `mcasHelp` will be at a greater reference and thus must eventually terminate. The other case for `tryAcquire`, is that the expected value is written after the `ccas` but before the `tryReadCCASRef`. In this case, we try again unless the status has already been decided. It might seem like this is just an optimization, and that we could instead treat this as the MCAS failing. However, the intervening write may have written the value that was there before the `ccas`, meaning that there was never a point at which the MCAS could have failed.

References are only “acquired” at the `ccas` in `phase1`. Once the status has been decided, no references may be “acquired” any longer. Since it’s impossible to enter `phase2` without deciding the status, once one process enters `phase2`, no processes are capable of “acquiring” references. This makes `phase2` idempotent and, indeed, each CAS in `phase2` is independently idempotent. Overlapping executions of `phase1` are fine essentially because each `ccas` is idempotent and the `statusRef` can change at most once.

Let’s see how an `mcas` interacts with other operations from the perspective of atomicity. If we attempt to read a reference via `readMCASRef` which is included in the list of references of an ongoing `mcas`, there are two possibilities. Either that reference has not yet been “acquired” by the `mcas`, in which case the read will occur conceptually before the MCAS, or it has been “acquired” in which case the read will help the MCAS to completion and then try again after the MCAS. The story is similar for overlapping `mcas`’s. The `mcas` which “acquires” the least reference in their intersection will conceptually complete first, either because it literally finishes before the second `mcas` notices or because the second `mcas` will help it to completion. Writes are only slightly different.

It is important to note that these operations are NOT atomic with respect to some other reasonable operations. Most notably, they are not atomic with respect to blind writes. It is easy to construct a scenario where two blind writes happen in sequence but the first appears to happen after the `mcas` and the second before. Except for initialization, I don’t believe there are any blind writes to references involved in a `mcas` in Fraser’s thesis. Fraser’s thesis does, however, contain `cas` operations directly against these references. These are also not atomic for exactly the same reason `casCCASRef` isn’t. That said, Fraser’s uses of `cas` against `MCASRef`s are safe, because in each case they just retry until success.

## Conclusion

While I’ve gone into a good amount of detail here, I’ve mainly wanted to illustrate the concept of recursive helping. It’s a key concept for lock-free and wait-free algorithm designs, but it also may be a useful idea to have in mind when designing concurrent code even if you aren’t explicitly trying to achieve a lock-free guarantee.

1. MCAS allows you to perform transactions involving multiple updates atomically. STM additionally allows you to perform transactions involving multiple reads and updates atomically.↩︎

2. While I haven’t attempted it to see if it works out, it seems like you could make a generic “recursive helping” framework by storing an `IO` action instead. The “descriptors” have the flavor of defunctionalized continuations.↩︎