Skip to main content

CLR Concurrency: Preventing Torn Reads Without Locks

The CLR's value types are incredibly useful for reducing memory usage of programs, but they have a severe limitation in concurrent scenarios: structs larger than the atomic type on a given machine can suffer from torn reads.

Most typical applications won't encounter this because their concurrent accesses, both reads and writes, are protected by locks. However, there are some scenarios where locks just aren't viable for various reasons. For instance, if a few shared variables are read an order of magnitude more often than they're written, then all the lock contention is 90% wasted work among readers that aren't performing any updates.

In principle, the lock is really there to permit only one writer to modify the variable at a time. This means we can possibly use some other signalling mechanism to notify readers that a write is taking place, or has taken place. Ideally, this mechanism shouldn't cause contention among readers thus permitting more read parallelism.

The typical abstraction for this is a reader-writer lock (rwlock). Basically, a number of concurrent readers are permitted to access a resource protected by the rwlock, and writers have to request access and wait until all readers are done. Then the writer proceeds, and all readers must wait until the writer is finished. Unfortunately, rwlocks aren't all they're cracked up to be. Most of the problems stem from the fact that any amount of shared state will inherently limit scalability, but part of the problem is also because readers are performing writes, and thus they are introducing unnecessary contention.

Turns out, it's possible to solve this contention issue using only an additional piece of data: a version number. Writers still coordinate via a standard lock, and when a writer enters the critical section, it increments a version number and executes a memory barrier. The version number is now odd, which indicates that a write is in progress. Then the writer writes to the variable, executes another barrier, then increments the version number again. The version number is now even, indicating that no write is in progress:

public static void Write<T>(
    ref T location,
    ref T value,
    ref int version,
    object writeLock)
{
    lock(writeLock)
    {
        ++version;              // ++version odd: write in progress
        Thread.MemoryBarrier(); // ensure increment complete before update
        location = value;
        Thread.MemoryBarrier(); // ensure update complete before increment
        ++version;              // ++version even: write complete
    }
}

Readers instead only consult the version number to check whether a write is in progress, or whether a write has transpired since we first started the read. We read the version number into a local called 'old', and then spin until the version number is even, which indicates that no write is in progress. Then we read the value from 'location' into a local.

However, a write could have occurred since we exited the loop checking for an odd version number. So we read the version number again and compare it against 'old'. If it differs, that means a write occurred while we were reading, possibly corrupting our read. In that case, we abort and retry the whole process from the beginning. If the version number matches 'old', then the value we read is correct, and we can safely return it [1]:

public static T Read<T>(ref T location, ref int version)
{
    T x;
    int old;
    do
    {
        // loop until version is even = no write in progress
        do
        {
            old = version;
            if (0 == (old & 0x01)) break; // odd version means write in progress
            Thread.MemoryBarrier();       // read(version) from memory
        } while (true);
        x = location;                     // read value from location
        Thread.MemoryBarrier();           // read(version) from memory
    } while (version != old);
    // if version after read == old, no concurrent write
    return x;
}

So we've achieved our goal: we have fully concurrent reads that don't contend for resources, and that don't block writers. I've just included these two atomic read/write functions, and some useful variants, into Sasa's Atomics class.

Turns out this approach isn't new, and Duffy covered a similar approach in a later blog post which I only came across after writing this piece. He rightly points out that these are the sort of techniques employed in software transactional memory (STM), whereby we do as much as possible optimistically, then validate at the end that nothing untoward happened (like a write when we weren't expecting it). If the unexpected happens, we "abort" and retry. As pointed out in the comments to that post, this is essentially the seqlock locking mechanism as used in the Linux kernel.

I don't anticipate using this too often, but I wouldn't have included it in Sasa if it didn't have an important application within Sasa itself. Sasa.Reactive is the assembly that provides reactive variables that can be concurrently read, but only updated by a single writer. I designed the atomic read/write functions above while refactoring the Sasa.Reactive implementation to be more robust, yet more conservative in resource use. These atomic read/write functions allow concurrent readers to safely obtain a snapshot of a reactive variable's value, without resorting to storing values in atomic types, like a reference type.

Hopefully others will find it useful as well. If anyone spots a problem in the above algorithm please do let me know! The CLR provides a relatively strong memory model, so I'm pretty sure I can eliminate one memory barrier in the atomic write function, but I'd love to have some other input. Concurrency is hard after all!

Edit: Brian Gideon helpfully ran some tests of his own that support the correctness of these operations, and their performance benefits over both locking, and interlocked operations.

[1] Technically, the version number could have been incremented so many times that it wrapped around until it matches the saved 'old' value, but IMO it's exceedingly unlikely that 232 writes occurred while we were trying to read a single variable.

Comments

Cory said…
Using Volatile.Read instead of Thread.MemoryBarrier should be significantly more scalable and give the same results.
Sandro Magi said…
Probably. Unfortunately, the Volatile class is a new addition, and only supports .NET 4.5. Sasa targets older frameworks as well.

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen...