Tuesday, February 24, 2015

µKanren.NET - Featherweight Relational Logic Programming in C#

The µKanren paper is a nice introduction to a lightweight logic programming language which is a simplification of the miniKanren family of languages. The existing µKanren implementation in C# was a translation from Scheme, and thus is verbose, untyped with lots of casts, and non-idiomatic. I also found most of the other Kanren implementations unnecessarily obscure, heavily relying on native idioms that aren't clear to newcomers.

uKanren.NET provides a clear presentation of the core principles of µKanren using only IEnumerable<T> and lambdas, showing that µKanren's search is fundamentally just a set of combinators for transforming sequences of states. The values of the sequence are sets of bound variables that satisfy a set of equations. For instance, given the following expression:

  Kanren.Exists(x => x == 5 | x == 6)

You can read it off as saying there exists an integer value to which we can bind variable x, such that x equals either 5 or 6 [1]. Solving this equation produces this sequence of values:

x[0] = 5,
x[0] = 6,

where each line represents a different value that satisfies the equation. These equations can be arbitrarily nested or chained using | and &, just like ordinary logical expressions, and µKanren will find a solution for all variables, if a solution exists. A further benefit of this implementation, is that you cannot accidentally mix up boolean equality expressions with relational logic expressions, because relational equality, conjunction and disjunction operators all operate on non-boolean types.

Logic Variables, Goals & States

We start with the simple core, which is just a logic variable identified uniquely by an integer, and with a convenient local name:

public sealed class Kanren
{
    internal int id;

    // the variable's public name, extracted from the delegate parameter
    public string Name { get; internal set; }

    public static Goal operator ==(object left, Kanren right)
    {
        return Equals(left, right);
    }

    public static Goal operator ==(Kanren left, object right)
    {
        return Equals(left, right);
    }

    public static Goal operator ==(Kanren left, Kanren right)
    {
        return Equals(left, right);
    }
    ...
}

µKanren.NET variables override the equality operators to generate "Goals". A Goal is represented by a delegate that takes a State and generates a sequence of States:

public delegate IEnumerable<State> Goal(State state);

A State is simply a set of variable bindings that satisfy the equations described by the Goal:

public sealed class State
{
    internal Trie<Var, Kanren> subs;
    internal int next = 0;
    internal Func<Goal> incomplete;

    internal Kanren Get(Var x)
    {
        Kanren v;
        return subs.TryGetValue(x, out v) ? v : null;
    }

    internal State Extend(Var x, Kanren v)
    {
        return new State
        {
            subs = subs.Add(x, v),
            next = next,
            incomplete = incomplete,
        };
    }

    internal State Next()
    {
        return new State
        {
            subs = subs,
            next = next + 1,
            incomplete = incomplete,
        };
    }
    ...
}

Each new variable introduced by Kanren.Exists increments the unique variable index we track in State.next, so each variable gets a unique number. Variable bindings are tracked in an immutable hash map provided by Sasa.Collections.Trie<T> (a fast immutable IDictionary).

µKanren Operators

There are 3 core operations on Goals, constructing a Goal with Kanren.Exists, and combining Goals using either disjunction or conjunction:

public sealed class Kanren
{
    ...
    public static Goal Exists(Func<Kanren, Goal> body)
    {
        var arg = body.Method.GetParameters()[0].Name;
        return state =>
        {
            var goal = body(new Kanren { id = state.next, Name = arg });
            return goal(state.Next());
        };
    }

    public static Goal Conjunction(Goal left, Goal right)
    {
        return state => left(state).SelectMany(x => right(x));
    }

    public static Goal Disjunction(Goal left, Goal right)
    {
        return state => left(state).Concat(right(state));
    }

    public static Goal Recurse(Func<Kanren, Goal> body, Kanren x)
    {
        return state => new[]
        {
            new State
            {
                subs = state.subs,
                next = state.next,
                incomplete = () => body(x)
            }
        };
    }
    ...
}

Kanren.Exists just creates a fresh variable using the next number in the sequence, and invokes the function body that produces the Goal. The Goal is then evaluated using the given State, but with an updated variable counter to ensure variables are unique.

Conjunction consists of equations like x == 6 & y == 7. For every State generated by the 'left' Goal, conjunction generates a set of States built from applying the 'right' Goal to that same state. So, x == 6 produces a binding assigning x to 6, which is then passed to the Goal for y == 7, which then adds a binding for y = 7. Thus, "x[0] = 6, y[1] = 7" is produced.

Disjunction consists of equations like x == 6 | x == 7. These Goals are independent and don't interfere with each other, and so any States generated by 'right' are simply appended to the States generated by 'left'. The 'left' Goal produces x = 6, and 'right' produce x = 7, so the result will be:

x[0] = 6,
x[0] = 7,

Because C# features eager evaluation, we have to handle recursion in Kanren expressions manually. We do this via Kanren.Recurse, which takes a Goal constructor and a kanren variable, and returns an "incomplete" State, which basically means a State that was only partially evaluated and may produce further states if you resume evaluation:

public sealed class State
{
    ...
    // True if this state is final, such that all bindings have values,
    // false if some computation remains to be done.
    public bool IsComplete
    {
        get { return incomplete == null; }
    }

    // Get the pairs of bound variables and their values.
    public IEnumerable<KeyValuePair<Kanren, object>> GetValues()
    {
        return subs;
    }

    // Return the remaining stream of states, if any.
    public IEnumerable<State> Continue()
    {
        if (IsComplete) throw new InvalidOperationException("State is complete.");
        return incomplete()(this);
    }
}

The Kanren client is currently responsible for handling incomplete States while iterating over the results. It's possible to handle this in a more uniform manner with a little more execution overhead, but uKanren.NET doesn't currently do so.

Unification

Finally, the real meat of logic programming is resolving equality between logic variables and values. Logic programming solves equations using an algorithm called "unification":

public sealed class Kanren
{
    ...
    static object Walk(object uvar, State env)
    {
        // search for final Kanren binding in env
        Kanren v;
        while (true)
        {
            v = uvar as Kanren;
            if (ReferenceEquals(v, null)) break;
            var tmp = env.Get(v);
            if (ReferenceEquals(tmp, null)) break;
            uvar = tmp;
        }
        return uvar;
    }

    static State Unify(object uorig, object vorig, State s)
    {
        var u = Walk(uorig, s);
        var v = Walk(vorig, s);
        var uvar = u as Kanren;
        var vvar = v as Kanren;
        if (!ReferenceEquals(uvar, null) && !ReferenceEquals(vvar, null)
            && uvar.Equals(vvar))
            return s;
        if (!ReferenceEquals(uvar, null))
            return s.Extend(uvar, v);
        if (!ReferenceEquals(vvar, null))
            return s.Extend(vvar, u);
        if (u.Equals(v))
            return s;
        return null;
    }
    ...
}

Unification brings together all equality declarations between variables and values and ensures they are all consistent. If two variables are unified, then they are either the same variable, or the variable values must unify. If a variable and value are unified, then that value is bound to that variable in the State by extending it. If two values are unified, then they must be equal to each other. If none of these conditions hold, then a null State is returned, indicating an inconsistent set of equality conditions.

Wrap Up

The real uKanren.NET implementation provides a few tweaks to interleave certain streams, and to provide convenient operators on Goals so you can easily combine more sophisticated expressions:

static Goal Fives(Kanren x)
{
    return x == 5 | Kanren.Recurse(Fives, x);
}

static Goal Sixes(Kanren x)
{
    return 6 == x | Kanren.Recurse(Sixes, x);
}

static Goal FivesAndSixes()
{
    return Kanren.Exists(x => Fives(x) | Sixes(x));
}

// output of FivesAndSixes:
// x[0] = 5,
// x[0] = 6,
// x[0] = 5, x[0] = 5, ... [infinite stream of fives]
// x[0] = 6, x[0] = 6, ... [infinite stream of sixes]

However, the implementation described above covers the core operation of µKanren, and the rest is just syntactic sugar to make it nice in C#.

[1] The "Exists" operator was called "call/fresh" in the original paper and in most other implementations. I found it easier to read off uKanren expressions as English with the form I settled on.

Wednesday, January 21, 2015

NHibernate: Associations with Composite Primary Keys as Part of a Composite Primary Key

NHibernate is a pretty useful tool, but occasionally it's not entirely documented in a way that makes it's flexibility evident. Composite keys are a particularly difficult area in this regard, as evidenced by the numerous articles on the topic.

Most of the existing articles cover this simply enough, but there is one uncommon corner case I have yet to see explained anywhere: a composite primary key one of whose key properties is an association with a composite key. This is probably pretty uncommon and there are ways around it, hence the lack of examples, but as a testament to NHibernate's flexibility, it's possible! Here's the example in code listing only the primary keys:

public class MotorType
{
  public Horsepower Horsepower { get; protected set; }
  public VoltageType VoltageType { get; protected set; }
}
public class Motor
{
  public MotorType MotorType { get; protected set; }
  public Efficiency Efficiency { get; protected set; }
}

The tables look like this:

MotorType
HorsepowerIdVoltageTypeId
Motor
HorsepowerIdVoltageTypeIdEfficiencyId

Most people would probably map this with the Motor class having the three primary key properties, one for each column, and an additional many-to-one association referencing MotorType, but that shouldn't actually be necessary. Composite primary keys are possible, and the primary key can contain an association, therefore it should be possible, in principle, for that association to itself need a composite key.

And here's how it's done for the Motors table:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
  <class name="Motor, ExampleProject" table="Motors">
    <composite-id>
      <key-many-to-one name="MotorType">
        <column name="HorsepowerId" />
        <column name="VoltageTypeId" />
      </key-many-to-one>
      <key-property name="Efficiency" column="EfficiencyId" />
    </composite-id>
  </class>
</hibernate-mapping>

The MotorType class is a simple composite primary key:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
<class name="MotorType, ExampleProject" table="MotorTypes">
    <composite-id>
      <key-property name="Horsepower" column="HorsepowerId" />
      <key-property name="VoltageType" column="VoltageTypeId" />
    </composite-id>
</class>
</hibernate-mapping>

Saturday, October 11, 2014

Generalized Multicast Delegates in Pure C#

.NET's delegates are a powerful and convenient abstraction available since .NET 1.1. They encapsulate a method pointer and the object the method belongs to in a single callable "function object". Multicast delegates are an extension of plain delegates, in that they encompass multiple single delegates. Invoking a multicast delegate invokes every encapsulated delegate, in the order they were added. Multicast delegates are key to event handling patterns that were a core part of the CLR nearly since its inception.

If you're curious about virtual machines or CLR internals, you've perhaps wondered how multicast delegates actually work. These are the key properties of multicast delegates:

  1. They encapsulate a list of delegates of the same delegate type.
  2. Adding a delegate returns a new multicast delegate containing the new addition at the end of the list (the original is unchanged).
  3. Removing a delegate returns a new multicast delegate without the specified delegate (the original is unchanged).
  4. Invoking a multicast delegate invokes each encapsulated delegate, in order, with the given parameters.

These properties are a straightforward immutable queue of delegates, and if we just wanted something that works, we could just use an immutable queue like Sasa.Fifo<T>. However, the CLR's multicast delegates are insanely fast to invoke by comparison. Here's a simple timing result:

    = System.MulticastDelegate =
    Build           =       1,390,819
    Run             =          53,762

    = Delegate Immutable Queue =
    Build           =         507,871
    Run             =         527,292

This builds up a multicast delegate of 20 items about 200,000 times, then executes that delegate 200,000 times. Building multicast delegates is quite slow on the CLR, but invoking them is an order of magnitude faster than naively traversing a queue and invoking each delegate. Presumably, invoking delegates happens more often than adding and removing delegates, so better invocation speed is a good tradeoff. So how do we achieve it?

First, let's consider the fastest possible implementation of invocation: arrays. If we preallocate an appropriately sized array of delegate slots, and invoke each entry in a for loop, we get these timings:

    = Delegate Array =
    Build           =          71,554
    Run             =          54,022

Exactly equal to multicast delegates, to within reasonable experimental error. So now we know we need some data structure that approximates a queue, but uses contiguous memory for fast traversals during delegate invocation. We can simulate an immutable array queue using Sasa's array combinators (see Append and Remove):

public struct MulticastSimple<T>
    where T : class
{
    internal T[] items;

    public MulticastSimple&lgt;T> Add(T item)
    {
        return new MulticastSimple<T>
        {
            items = items == null ? new[] { item } : items.Append(item),
        };
    }

    public MulticastSimple<T> Remove(T item)
    {
        if (items != null)
        {
            var i = Array.IndexOf(items, item);
            if (i >= 0) return new MulticastSimple<T>
            {
                items = items.Length == 1 ? null : items.Remove(i)
            };
        }
        return this;
    }
}

It's pretty simple, but let's see how it compares with the built-in multicast:

    = Multicast Delegate =
    Build           =       1,334,914
    Run             =          57,646

    = MulticastSimple =
    Build           =         860,529
    Run             =          58,011

Pretty good! Building MulticastSimple is much faster, and invocation is just as fast as the built-in multicast delegates. Remember that these numbers are for 20 delegates though, and given the cost of adding delegates is O(n), this won't scale very well to large numbers of delegates. Let's see what the numbers look like for 200 delegates iterated 20,000 times:

    = System.MulticastDelegate =
    Build           =       1,324,381
    Run             =          45,683

    = MulticastSimple =
    Build           =       2,545,824
    Run             =          45,497

MulticastSimple is already more expensive than multicast delegates, so now let's see 2,000 delegates iterated 2,000 times:

    = System.MulticastDelegate =
    Build           =       1,324,422
    Run             =          45,293

    = MulticastSimple =
    Build           =      21,115,099
    Run             =          51,515

And here we see the limitations of the naive approach. The cost of generating multicast delegates appears constant regardless of the number of delegates, but the cost for MulticastSimple scales linearly with the number of delegates. However, even 20 delegates in a multicast delegates is uncommon, so optimizing for the rare cases probably isn't worth the effort.

A Scalable Multicast up to M

Fortunately, there is a simple extension of the above design which will efficiently support up to M delegates, where M is some configurable number. Basically, efficient delegate addition needs a single small array, and efficient traversal needs an array as well, but there's no need for these two arrays to be the same:

public struct Multicast<T>
    where T : class
{
    internal T[][] items;
    internal T[] end;

    public Multicast<T> Add(T item)
    {
        if (items == null) return new Multicast<T> { end = new[] { item } };
        if (items.Length < 16) return new Multicast<T>
        {
            items = items, end = end.Append(item)
        };
        return new Multicast<T>
        {
            items = items == null ? new T[][] { end } : items.Append(end),
            end = new[] { item },
        };
    }

    public Multicast<T> Remove(T item)
    {
        if (items != null)
        {
            var i = Array.IndexOf(items, item);
            if (i >= 0) return new Multicast<T>
            {
                items = items, end = end.Length == 1 ? null : end.Remove(i)
            };
        }
        else if (items != null)
        {
            for (int i = 0; i < items.Length; ++i)
            {
                var j = Array.IndexOf(items[i], item);
                if (j >= 0) return new Multicast<T>
                {
                    items = items[i].Length == 1
                          ? items.Remove(i)
                          : items.Set(i, items[i].Remove(j))
                };
            }
        }
        return this;
    }

    public T[][] Items { get { return items; } }
    public T[] End { get { return end; } }

    public static Multicast<T> operator +(Multicast<T> lhs, T rhs)
    {
        return lhs.Add(rhs);
    }

    public static Multicast<T> operator -(Multicast<T> lhs, T rhs)
    {
        return lhs.Remove(rhs);
    }
}

So we keep a small array up to 32 items for efficient appends (field 'end'), and when 32 or more delegates are added this overflows into a nested array (field 'items') and the 'end' array is restarted. So appends have the same low constant factor overheads as the MulticastSimple, but scale at linearly at a factor of n/32 instead of just n. This is still fundamentally O(n), but the constant factor overheads of builtin multicast delgate appends are so high that they only beat out this implementation around 10,000 delegates:

    = System.MulticastDelegate =
    Build           =       1,479,262
    Run             =          44,318

    = Multicast<T> =
    Build           =       1,778,939
    Run             =          58,599

A multicast delegate with 10,000 entries is more than any reasonable program will ever need. At 200 delegates:

    = System.MulticastDelegate =
    Build           =       1,445,167
    Run             =          44,934

    = Multicast<T> =
    Build           =         949,040
    Run             =          57,367

At 20 delegates:

    = System.MulticastDelegate =
    Build           =       1,411,269
    Run             =          55,544

    = Multicast<T> =
    Build           =         853,145
    Run             =          57,036

And probably the most likely case, 4 delegates:

    = System.MulticastDelegate =
    Build           =         979,093
    Run             =          80,781

    = Multicast<T> =
    Build           =         632,594
    Run             =          77,277

Multicast Invocation

Normally the C# compiler produces the code needed to invoke any delegate type, and we can achieve something similar with T4 templates and reusing the built-in System.Action overloads. First, let's look at the basic invocation for System.Action:

public static void Invoke(this Multicast<Action> m)
{
    for (var i = 0; m.items != null && i < m.items.Length; ++i)
    {
        var x = m.items[i];
        for (var j = 0; j < x.Length; ++j)
        {
            x[j]();
        }
    }
    for (var i = 0; m.end != null && i < m.end.Length; ++i)
      m.end[i]();
}

Pretty straightforward: just iterate over each nested array and invoke the delegates. The invocation for an action with a parameter consists of just adding the parameter to invoke:

public static void Invoke<T0>(this Multicast<Action<T>> m, T0 arg0)
{
    for (var i = 0; m.items != null && i < m.items.Length; ++i)
    {
        var x = m.items[i];
        for (var j = 0; j < x.Length; ++j)
        {
            x[j](arg0);
        }
    }
    for (var i = 0; m.end != null && i < m.end.Length; ++i)
        m.end[i](arg0);
}

And so on for all System.Action overloads up to 16 type arguments. We can automate generating these overloads using T4 templates.

The Semantics of Multicast System.Func

Multicast delegates on the CLR only have proper semantics for delegates that return no value. If you combine multiple delegates that return a value, then only the value from the last delegate is ever returned. But in principle, executing a list of delegates should generate a list of results. And the advantage of pure C# multicast delegates is that implementing this semantics is trivial!

public static IEnumerable<T> Invoke<T>(this Multicast<Func<T>> m)
{
    for (var i = 0; m.items != null && i < m.items.Length; ++i)
    {
        var x = m.items[i];
        for (var j = 0; j < x.Length; ++j)
        {
            yield return x[j]();
        }
    }
    for (var i = 0; m.end != null && i < && m.end.Length; ++i)
        yield return m.end[i]();
}

This is basically the same C# code as Action invocations, but we now exploit C#'s generator syntax to easily return lists of values while executing all the encapsulated delegates. Like System.Action, we can generate all the necessary overloads for System.Func using T4 templates.

Summary

We haven't reproduced precisely the internal multicast implementation, but we have something incredibly simple that's just as fast for all foreseeable programs, and it's in safe C# to boot. Furthermore, we've extended multicast delegates to include a more coherent and complete semantics for delegates that return values, all in about 150 lines of code.

You can download the full source code for this post here, including MulticastSimple and Multicast with T4 templates for the latter.

Thursday, September 25, 2014

Sasa v0.14.0 Released

A quick release, fixing one bug and adding some performance improvements and some new collection types. Here's the changelog:

 * added a faster and more compact hash array mapped trie under Sasa.Collections.Trie to replace Sasa.Collections.Tree
 * added Sasa.Collections.FingerTree
 * added purity attributes throughout Sasa (for .NET 4)
 * expanded various test suites and documentation
 * added a persistent vector, Sasa.Collections.Vector
 * factored out Atomics.BeginRead/BeginWrite read/write protocol so it can be used in more flexible scenarios
 * Sasa.Dynamics.Type.MaybeMutable is now computed eagerly
 * added an efficient, mutable union-find data structure
 * fixed: Uris.UriDecode now handles the incorrect but common UTF16 encoding
 * moved Sasa.LockSet under Sasa.Concurrency.LockSet

The Sasa.Collections.Tree type is no longer available, and is replaced by the faster and more appropriately named Sasa.Collections.Trie. The original tree was actually a trie, and I didn't want it confused with a future search tree type which provides different operations. The FingerTree is an experimental addition, as is Sasa.Collections.Vector which provides a fast immutable array.

Edit: the online docs for the new release are also available here.

Friday, July 18, 2014

Sasa v0.13.0 Released

The latest Sasa release fixes a few bugs with MIME parsing, and adds a few new concurrency features. Here is the online documentation, or a downloadable CHM file from Sourceforge is available alongside the binaries. The binaries are also available via nuget, of course. Here's the changelog:

  • added Sasa.Concurrency.RWLock, a truly slim concurrent read/write lock
  • switched Sasa.Dynamics.PIC to use RWLock
  • switched Sasa.Dynamics.PIC to rely only on arrays for performance reasons
  • Mail message parsing now doesn't use ad-hoc means to extract a body from the attachments
  • added Sasa.Changeable<T> which encapsulates all INotifyPropertyChanged and INotifyPropertyChanging logic with no space overhead
  • fixed an MIME HTML parsing bug
  • fixed regex lexing
  • added more efficient Enums class exposing static properties for various enum properties
  • alternate views inside multipart/related no longer incorrectly dropped
  • added well-behaved standards conforming URI encode/decode to Sasa.Uris
  • added overload to customize string comparison type when tokenizing

Nothing too Earth-shattering. While I generally deplore reinventing the wheel, I found the URL encoding/decoding functions provided by System.Uri and in System.Web to be too inconsistent for my purposes in Clavis. The encode/decode functions in Sasa.Uris now work on StringBuilder, so they are more efficient, and they fully conform to the latest RFC.

The RWLock was covered here before, so no need to detail that. The PIC uses internal tables which are now protected by RWLock.

The only other really new feature is the Sasa.Changeable<T> type, which encapsulates the logic implementing INotifyPropertyChanging and INotifyPropertyChanged:

public struct Changeable<T>
{
  public T Value { get; private set; }

  public bool Update(string name, T value,
                    PropertyChangingEventHandler onChanging,
                    PropertyChangedEventHandler onChanged)
  {
    if (EqualityComparer<T>.Default.Equals(Value, value)) return false;
    onChanging.Raise(name, new PropertyChangingEventArgs(name));
    Value = value;
    onChanged.Raise(name, new PropertyChangedEventArgs(name));
    return true;
  }
}

So instead of repeating this logic in every property, you simply declare the field to be a Changeable<T> and call update with references to the appropriate event handlers:

public class Foo : INotifyPropertyChanging, INotifyPropertyChanged
{
  Changeable<int> foo;
  PropertyChangingEventHandler onChanging;
  PropertyChangedEventHandler onChanged;

  public PropertyChangingEventHandler PropertyChanging
  {
    add { Sasa.Events.Add(ref onChanging, value); }
    remove { Sasa.Events.Remove(ref onChanging, value); }
  }

  public PropertyChangedEventHandler PropertyChanged
  {
    add { Sasa.Events.Add(ref onChanged, value); }
    remove { Sasa.Events.Remove(ref onChanged, value); }
  }

  public int Foo
  {
    get { return foo.Value; }
    set { foo.Update("Foo", value, PropertyChanging, PropertyChanged); }
  }
}

If the value differs from the current value, then the events will be raised. The Update method returns true if the value was updated, and false otherwise so you can implement your own change logic as well.

Note that the handlers can be null if you're not implementing both INotifyPropertyChanging and INotifyPropertyChanged.

Sunday, April 20, 2014

Immutable Sasa.Collections.Tree vs. System.Collections.Dictionary vs. C5 HashDictionary

I've previously posted about Sasa's hash-array mapped trie, but I never posted any benchmarks. I recently came across this post on Stackoverflow which provided a decent basic benchmark between .NET's default Dictionary<TKey, TValue>, the C5 collection's hash dictionary, F#'s immutable map, and .NET's new immutable collections.

I slightly modified the file to remove the bench against the F# map and the new immutable collections since I'm still using VS 2010, and I added a simple warmup phase to ensure the methods have all been JIT compiled and the GC run to avoid introducing noise:

static void Warmup()
{
    var x = Tree.Make<string, object>();
    var y = new C5.HashDictionary<string, object>();
    var z = new Dictionary<string, object>();
    z.Add("foo", "bar");
    for (var i = 0; i < 100; ++i)
    {
        x = x.Add("foo" + i, "bar");
        y.Add("foo" + i, "bar");
        z.Add("foo" + i, "bar");
        var tmp1 = x["foo" + i];
        var tmp2 = y["foo" + i];
        var tmp3 = z["foo" + i];
    }
    x = default(Tree<string, object>);
    y = null;
    z = null;
    GC.Collect();
}

The results are still somewhat representative. This is a sample of an average output, where "Imm" is Sasa's immutable HAMT:

# - 100
SCGD -          0 MS -         25 Ticks
C5   -          0 MS -        887 Ticks
Imm  -          0 MS -        387 Ticks

# - 1000
SCGD -          0 MS -        257 Ticks
C5   -          0 MS -        294 Ticks
Imm  -          0 MS -        368 Ticks

# - 10000
SCGD -          1 MS -       4084 Ticks
C5   -          1 MS -       5182 Ticks
Imm  -          1 MS -       5436 Ticks

# - 100000
SCGD -         28 MS -      85742 Ticks
C5   -         32 MS -      99280 Ticks
Imm  -         32 MS -      97720 Ticks

Observations:

  1. C5's standard deviation was somewhat wider than both Sasa's HAMT and SCGD, so it's performance seems slightly less predictable
  2. Sasa's immutable HAMT appears to perform within 5% of the mutable C5 collection at all collection sizes
  3. Sasa's immutable HAMT appears to perform within 15% of the mutable SCGD for large collections where the hash table with higher load factors
  4. Small collections requiring a small load factor clearly advantage the mutable SCGD by up to an order of magnitude, an advantage not shared by C5 for some reason (possibly they maintain a higher load factor)
  5. C5's terrible performance on very small collections of 100 items was consistent on every test run, again possibly because they maintain a high load factor before resizing
  6. Sasa's HAMT takes just as much time to load 1000 items as it takes to load 100 items; this was consistent across every test run, and it's not clear why

Finally, while not exactly apples-to-apples, Sasa's HAMT is easily 3-4× faster than F#'s map given the numbers cited in the above Stackoverflow post. F# still has an advantage for very small collections though. Sasa's HAMT also appears to be at least 2× faster than the new immutable collections.

Also keep in mind that this benchmark only tests lookup performance. F#'s map would have an advantage over Sasa's HAMT in load performance because the HAMT does not yet include a "bulk-load" operation, which the F# map does appear to support.

Tuesday, April 1, 2014

A Truly Slim Read/Write Lock in C#

It's pretty well known that the CLR's ReaderWriterLock and ReaderWriterLockSlim have unappealing performance characteristics. Each class also encapsulates signficant state, which precludes its use in fine-grained concurrency across large collections of objects.

Enter Sasa.Concurrency.RWLock in the core Sasa assembly. This is the most lightweight R/W lock I could come up with, particularly in terms of resources used. It's a struct that encapsulates a simple integer that stores the number of readers and a flag indicating whether a writer is active.

The interface is similar to ReaderWriterLockSlim, although there are a few differences which are needed to keep the encapsulated state so small:

public struct RWLock
{
  // this field is the only state needed by RWLock 
  private int flags;

  public void EnterReadLock();
  public void ExitReadLock();
  public bool TryEnterReadLock();

  public void EnterWriteLock(object sync);
  public bool TryEnterWriteLock(object sync);
  public void ExitWriteLock(object sync);
}

Conceptually, EnterWriteLock calls Monitor.Enter(sync), which ensures that only a single writer acquires the write lock. It then sets the write bit in the "flags" state, and loops yielding its time slice until all read locks are released.

EnterReadLock also loops yielding its time slice until the write flag is cleared, and then it uses Interlocked.Increment to acquire a read lock, and Interlocked.Decrement to release the read lock.

The TryEnterReadLock and TryEnterWriteLock provide non-blocking semantics, so there is no looping. If the lock on 'sync' cannot be acquired, or the write flag is set, TryEnterWriteLock and TryEnterReadLock respectively return false immediately. They never block or loop under any circumstances.

The RWLock implementation is about 150 lines of heavily commented code, so it's easily digestible for anyone whose interested in the specifics. There are also some rules to abide by when using RWLock:

  1. The same 'sync' object must be passed to all write lock calls on a given RWLock. Obviously if you use a different object, more than one writer can proceed. Different objects can be used for different RWLocks of course.
  2. Recursive write locks are forbidden and will throw LockRecursionException. Recursive read locks are permitted.
  3. You cannot acquire a read lock inside a write lock, or a write lock inside a read lock. If you do, your program will immediately deadlock.

Unlike the base class libraries, none of my concurrency abstractions accept timeout parameters. Timeouts hide concurrency bugs and introduce pervasive non-determinism, which is partly why concurrent programs are traditionally hard to debug. Timeouts should be rare, and specified separately at a higher level than these low-level concurrency primitives.