Friday, December 23, 2011

Software Transactional Memory in Pure C#

Concurrent programming is a very difficult problem to tackle. The fundamental issue is that manual locking is not composable, which is to say that if you have two concurrent programs P0 and P1 free of deadlocks, livelocks and other concurrency hazards, and you try to compose P0 and P1 to create a program P2, P2 may not be free of concurrency hazards. For instance, if P0 and P1 take two locks in different orders, then P2 will deadlock. Needless to say, this is a serious problem because composition is the cornerstone of all programming.

I've been toying with some ideas for software transactional memory (STM) in C# ever since I started playing with FRP and reactive programming in general. The problem in all of these domains is fundamentally about how to handle concurrent updates to shared state, and how to reconcile multiple, possibly conflicting updates to said state.

Rx.NET handles concurrency essentially by removing the identity inherent to shared state. An IObservable<T> is actually a collection of all values pushed to that observable in some undefined order. If you were to create an IObservable that retains only the "last" pushed value, and thus now retains an identity, you then have the same problems as above, namely that this update must always be consistent with other updates at any given instant in time. For instance:

var plusOne = intObservable.Select(i => i+1);

At every instant in the program's execution, plusOne should always observably equal intObservable + 1, and the ability to observe a violation of this constraint is known in reactive literature as a 'glitch'.

Similarly, in database programming where transactions rule, this is known as a 'dirty read'. Essentially, an update to intObservable is executing in a transaction, but other transactions are able to view those changes before that transaction has committed.

Generally speaking, glitches and dirty reads are undesirable, because they require the developer to manually synchronize state, which defeats the whole purpose of going with FRP or transactions to begin with. From what I've seen so far, Rx.NET gets around this by not providing abstractions that expose identity in this way. The programs you write must work with collections of values, and the program must specify the ordering via Observable.OrderBy.

When I added the Property<T> IObservable to Sasa, I added a limited form of transactions to prevent glitches, because a property has identity. This implementation uses a global 'clock', which is really just a global uint64 counter to properly sequence updates and prevent glitches.


I'm going to focus here on implementing STM directly, but to keep it simple, I've gone with the simplest STM that is expressible using .NET primitives. In fact, the resulting STM is probably not good if you're after scaling, it does a good job of ensuring concurrency safety for arbitrary composition.

The STM I committed to Sasa is a very simple, perhaps even simplistic, STM employing encounter-time locking with deadlock detection on transactional variables. Any read or write acquires the lock on a transactional variable. Whenever two transactions would block to wait for each other, the transaction that is not already blocked is aborted and retried.

This design has advantages and disadvantages. The disadvantages are the limited concurrency even when reads and writes would not conflict. Two transactions that only read a transactional variable Y, would still block each other despite the fact that concurrent reads can't cause problems. Furthermore, the use of encounter-time locking means that locks can be held for a long time. Finally, the naive deadlock detection combined with encounter time locking means that some programs will have higher abort rates than they would in other STMs.

There are significant advantages to this approach though. For one, a transaction doesn't require elaborate read/write/undo logs. In fact, this STM requires only a single allocation for the transaction object itself at transaction start. By contrast, most other STM designs require at least one allocation for every object that is read or written. These allocation costs are generally amortized, but they still add up.

The STM is also conceptually simple at 450 lines of code, including elaborate comments (127 lines counting only semicolons). This STM consists of only 3 classes, and 1 exception, and uses only System.Monitor for locking. This means that the STM isn't really fair, but it's rather simple to replace standard locks with a fair locking once the core STM algorithm is understood.

There is also preliminary support for integration with System.Transactions.

Transactional Programming

Any sort of transactional programming requires a transaction:

public sealed class MemoryTransaction : IEnlistmentNotification,
  public static MemoryTransaction Begin();
  public static void Run(Action body);
  public static MemoryTransaction Current { get; }
  public void Complete();

This class is closely modeled on the design of TransactionScope from System.Transactions. Programs will generally concern themselves mostly with transactional variables, which in Sasa.TM is called Transacted<T>:

public class Transacted<T> : Participant, IRef<T>
  public T Value { get; set; }
  public void Write(T value, MemoryTransaction transaction);
  public T Read(MemoryTransaction transaction);

Any reads and writes to Transacted<T> occur within the lifetime of a MemoryTransaction, and the set of all such reads and writes are committed atomically. A simple program demonstrating the use of these abstractions:

Transacted<int> accountBalance = new Transacted<int>();
MemoryTransaction.Run(() =>
  accountBalance.Value += 100;

MemoryTransaction.Run will handle all the commits, rollbacks and retries for you. You can do this manually as well if you catch RetryException, and call Complete and Dispose methods on the transaction manually, but for most purposes the Run method suffices. You can nest calls to Run as many times as you like, but only one top-level transaction will ever be created.

No matter how many concurrent threads are executing the above code, it will always be updated atomically, and you can compose the above program with any other transactional program, and the result will also be free of concurrency hazards. The one caveat is that you should not cause non-transactional side-effects from within a transaction.

Please refer to the API docs under Sasa.TM for further details.


The internals of this STM design is pretty straightforward. Structurally, it looks something like this:

+---------------+   +---------------+    +----------------+
|Tx0            |   |Transacted0    |    |Transcated1     |
|  participants---->|  value = 2    | +->|  value = true  |
|  waitingFor   |   |  undo  = 1    | |  |  undo  = false |
+-----|---------+   |  next-----------+  |  next  = null  |
      |       ^     |  owner        |    |  owner         |
      |       |     +---|-----------+    +---|------------+
      |       |         |                    |
      |       +------------------------------+
+------------------+     +------------------+
|Tx1               |     |Transacted2       |
|  participants--------->|  value = null    |
|  waitingFor=null |     |  undo  = "Foo"   |
+------------------+<-------owner           |

There's quite a bit going on here, so here are some quick highlights:

  • Each Transacted<T> is a member of a linked list rooted in the "participants" field of the MemoryTransaction. The list consists of Transacted<T> which have been read or written during the current transaction.
  • Each Transacted<T> points to the current transaction that owns its lock.
  • Each MemoryTransaction that attempts to acquire a lock on a Transacted<T>, stores the Transacted<T> in a local field called "waitingFor".
  • Transacted<T> stores the original value before any changes are made, so we can rollback if the transaction aborts.

From the above graph, we can see that there are two running transactions, Tx0 and Tx1, and that Tx0 has read or written Transacted0 and Transacted1, and it has tried to read/write Transacted2. However, Tx1 currently owns the lock on Transacted2, so Tx0 is effectively blocked waiting for Tx1 to complete.

This dependency graph is acyclic so there is no deadlock. If Tx1 were to then try to acquire the lock on Transacted0 or Transacted1, we would create a cycle in the waits-for graph, and we would have to abort one of the transactions.

On commit, a transaction's participant list is walked, unlinking elements as it goes, and all the undo fields are cleared and the locks are released. The next transaction blocked on any of the participants acquires the lock it's been waiting for, sets the owner field, and proceeds.

Rollback is much the same, except the Transacted<T>'s value field is first overwritten with the value from the undo field.

Future Work

Fair STM

To those that have read my previous posts, note that the structure of the MemoryTransaction is exactly the structure of MetaThread from a previous post. By simply adding a WaitHandle to MemoryTransaction with a FIFO locking protocol, we have a fair STM.

Lock Stealing

STM research so far has shown that most transactions are short enough that they can execute in a single timeslice, and throughput suffers if a thread is descheduled while it's holding locks. This would only be exacerbated in an encounter-time locking design like I've described here, since locks are held for longer.

Instead of blocking on a variable that is already owned, we can instead steal the lock under certain conditions. For instance, if Tx0 and Tx1 are merely reading from a variable, they can repeatedly steal the lock from each other without concern.

A transaction that writes a variable that has only been locked for reading, can steal that lock too, but if the original owner tries to read the variable again, it must abort.

If Tx0 and Tx1 both try to write the same variable, blocking is unavoidable.

Obviously, all of these performance improvements impact the simplicity of the original design, so I'm leaving them for future work if the need arises.


Jordan said...

I worked on something similar. Check this out:

Sandro Magi said...

Your description of Cloneable is exactly how Sasa.Dynamics works. I basically expose a static Type<T> class which exposes three delegates: Create of type Func<T>, which creates an empty/default instance of a type, Case of type Action<T, IReducer<T, FieldInfo>>, which dispatches on the type of the object into the IReducer handler, and Reflect also of type Action<T, IReducer<T, FieldInfo>>, which reflects over an object's fields.

I'm in the process of porting this code over to an even more general abstraction which I haven't published yet, but the core idea of type safe reflection remains the same. See my last Sasa release for the first incarnation of that idea.

As for the transactions itself, it looks like you've taken a different approach by allowing multiple concurrent versions of an object. I took the much simpler approach of simply allowing one concurrent version. There are tradeoffs to each.

Regarding your Maybe<T>, you can't compare it against null and get a meaningful result. You need to override equality on a nullable Maybe type like I do with my Option type (see lines 93-98 and 170-175). If you provide those overloads, then you do: if (x == null) ..., instead of just: if (x.HasValue) ...

You also sorely need some comments! ;-)

Unknown said...

Hi Sandro
Not that I'm any expert on the topic, and probably never will be, but it seems to me what you implemented is not what is described on the wiki page.
Software transactional memory is supposed to be optimistic, i.e. lock-free with retrying. You do locking and retry when you detect deadlock. That's what I would call pessimistic, i.e. the complete opposite.

Jan S.

Sandro Magi said...

Software transactional memory does not require any specific implementation strategy. It can be optimistic or pessimistic, and you'll find both in the literature. The most efficient recent STMs are actually pessimistic.

As long as program fragments operate on a form of native in-memory data types where access is mediated by transactions, and thus concurrent programs can be safely composed, it qualifies as STM.

Jan S said...

All right, I was just referring to the wiki article, where the second sentence says "It is an alternative to lock-based synchronization".
Maybe someone with a good understanding of the topic (hint hint) should fix the article :)

Sandro Magi said...

I did make a small adjustment already, but the meaning of "alternative to lock-based synchronization" is that you don't have to manually specify the locks to acquire. Pretty much all STMs use some kind of locking or mutual exclusion internally, even the optimistic ones.

Jan S said...

It's abstracted away but it'still lock-based. Lock-based doesn't mean "you have to use the lock keyword in your code" and it also doesn't mean "there is a lock used somewhere down in the implementation". No. _based_ says it's really at the heart of the whole idea, and that in other words means the same as "pessimistic".
At least that's how I read the article and I bet many others do as well