Sunday, September 22, 2013

CLR: The Cost of Dynamic Type Tests

I recently came across Vance Morrison's blog post on the relative costs of dynamic type tests on the CLR, and I was struck by how much my experience differed from the results he received. In past tests, I had concluded that operations and comparisons using System.Type were just as fast as operations on System.RuntimeTypeHandle. This struck me as a little odd at the time, but numbers don't lie.

Vance helpfully provided the code he used for his benchmarks, so I decided to see if perhaps I was mistaken. Lo and behold, the numbers I received from running his code matched my past results, ie. RuntimeTypeHandle provided no advantage. This seemed extremely odd, and after digging a little deeper, it turns out that Vance and I are both right. I've been doing most of my development on 64-bit x64 machines, and I suspect Vance was running x86 at the time. It turns out that the x64 runtime for the CLR is woefully underperforming when compared to x86, at least for this type of code. I knew there was a performance difference between the two, but the x86 backend really shines in dynamic type tests, demonstrating the CLR team's heavy investment into x86 optimizations.

The Original Tests

I'll first present the results of the original benchmarks, this time compiled for x64, x86, and Any CPU:

x64

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   15.962 +- 4%    msec
typeof(string).TypeHandle            : count: 500000   17.326 +- 3%    msec
anObj.GetType() == type              : count: 500000   17.960 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   16.329 +- 3%    msec
anObj.GetType() == typeof(string)    : count: 500000    1.858 +- 17%   msec
(anObj is string)                    : count: 500000    4.436 +- 8%    msec

These results differ significantly from Vance's original data. The type handle test was very nearly the slowest way to perform dynamic type tests, not the fastest as Vance concluded. The fastest were direct comparisons of System.Type, and subtype tests via the 'is' operator.

Any CPU

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   16.269 +- 5%    msec
typeof(string).TypeHandle            : count: 500000   17.353 +- 4%    msec
anObj.GetType() == type              : count: 500000   18.111 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   17.266 +- 2%    msec
anObj.GetType() == typeof(string)    : count: 500000    1.804 +- 3%    msec
(anObj is string)                    : count: 500000    4.487 +- 4%    msec

As expected, the results for the "Any CPU" build match the x64. I'm running a 64-bit OS, so the x64 runtime is the default launched when running a platform agnostic program.

x86

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   15.417 +- 2%    msec
typeof(string).TypeHandle            : count: 500000    0.830 +- 14%   msec
anObj.GetType() == type              : count: 500000   20.976 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   36.082 +- 2%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.007 +- 12%   msec
(anObj is string)                    : count: 500000    4.893 +- 6%    msec

Here we see results much closer to Vance's original data. Type handles are blazingly fast, and the JIT seems to recognize certain comparison patterns and optimizes them heavily.

The Generic Tests

Vance's benchmarks only compared the costs of hard-coded type tests, where the JIT has better visibility on the static types involved and can perhaps optimize more readily. This doesn't necessarily translate to code that performs dynamic type tests on generic variables, so I duplicated Vance's test suite to operate on an abstract type T. The numbers were once again quite surprising. The fastest operations when the static types are hard-coded, became the slowest when operating on an abstract type parameter. The slowest tests became just a little slower when operating on type parameters, but not dramatically so.

x64

typeof(T)                            : count: 500000   25.621 +- 2%    msec
typeof(T).TypeHandle                 : count: 500000   28.043 +- 2%    msec
anObj.GetType() == type              : count: 500000   18.727 +- 1%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   18.950 +- 4%    msec
anObj.GetType() == typeof(T)         : count: 500000   39.888 +- 2%    msec
(anObj is T)                         : count: 500000   33.884 +- 2%    msec

As I mentioned, the fastest x64 type tests became the slowest tests when operating on generic type parameters. The other type tests slowed down slightly as well, not nowhere near as dramatic a fall. Still, my past results remain: comparing System.Type instances seems to be the most efficient, and stable way to compare dynamic types, at least for x64.

Any CPU

typeof(T)                            : count: 500000   25.234 +- 3%    msec
typeof(T).TypeHandle                 : count: 500000   29.123 +- 3%    msec
anObj.GetType() == type              : count: 500000   18.699 +- 2%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   22.600 +- 5%    msec
anObj.GetType() == typeof(T)         : count: 500000   40.124 +- 2%    msec
(anObj is T)                         : count: 500000   32.301 +- 1%    msec

As expected, the "Any CPU" results match x64. Nothing surprising here.

x86

typeof(T)                            : count: 500000   22.395 +- 3%    msec
typeof(T).TypeHandle                 : count: 500000    4.478 +- 2%    msec
anObj.GetType() == type              : count: 500000   21.170 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   36.064 +- 2%    msec
anObj.GetType() == typeof(T)         : count: 500000   36.878 +- 1%    msec
(anObj is T)                         : count: 500000   45.831 +- 4%    msec

Here we partially confirm Vance's data again, whereby typeof(T).TypeHandle is incredibly fast on x86, but nearly 5 times slower than the non-generic test (which seems odd considering it should just be one or two memory fetches, so a 5x slowdown seems excessive).

However, the second and third fastest ways to perform dynamic type tests on x86 when static types are known, became the slowest when operating on generic parameters. This performance degradation matched the trend seen on x64, so at least that's consistent. The standard operations on System.Type are roughly the same, so their performance was much more stable.

Conclusions

I'm not sure we can draw any reliable conclusions from this data as to the fastest way to perform dynamic type tests, at least across platforms. Using RuntimeTypeHandles should be the fastest, but their poor showing on x64 makes me question whether it's worth it. Hopefully the CLR team will put some more effort into optimizing the x64 code generator to improve the performance of RuntimeTypeHandle. As it currently stands though, operating on System.Type seems like the most stable across all the platforms, and more importantly, it doesn't degrade in the presence of generics.

You can download the code for the modified benchmark suite here. All credit to Vance for the code, I just copied and pasted the tests and modified them slightly to operate on type parameters. The x86 and x64 builds are selected via different build configurations. Release mode builds for Any CPU.

Edit: The results above target .NET 2.0, which largely shares the same runtime as .NET 3.0 and 3.5, and the results are the same all the way up to .NET 4.0. The .NET 4.0 VM allegedly underwent many changes, and after performing a few preliminary tests, I can say that the above numbers are all completely different. RuntimeTypeHandle is now the slowest test on x86, and x64 is way faster than x86 on pretty much all the tests. System.Type is stable and fastest even for generics. So the conclusion seems pretty obvious now: stick to System.Type and avoid RuntimeTypeHandle at all costs.

x64 - .NET 4.0

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000    8.525 +- 6%    msec
typeof(string).TypeHandle            : count: 500000   19.892 +- 4%    msec
anObj.GetType() == type              : count: 500000    4.792 +- 4%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   47.924 +- 3%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.563 +- 8%    msec
(anObj is string)                    : count: 500000    2.439 +- 8%    msec

typeof(T)                            : count: 500000   25.557 +- 2%    msec
typeof(T).TypeHandle                 : count: 500000   44.622 +- 3%    msec
anObj.GetType() == type              : count: 500000    4.310 +- 4%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   47.437 +- 1%    msec
anObj.GetType() == typeof(T)         : count: 500000   23.605 +- 4%    msec
(anObj is T)                         : count: 500000   39.682 +- 3%    msec

x86 - .NET 4.0

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   14.871 +- 3%    msec
typeof(string).TypeHandle            : count: 500000   36.175 +- 4%    msec
anObj.GetType() == type              : count: 500000   25.566 +- 1%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   49.741 +- 4%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.620 +- 10%   msec
(anObj is string)                    : count: 500000    5.649 +- 9%    msec

typeof(T)                            : count: 500000   17.096 +- 11%   msec
typeof(T).TypeHandle                 : count: 500000   44.402 +- 2%    msec
anObj.GetType() == type              : count: 500000   25.086 +- 2%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   48.827 +- 3%    msec
anObj.GetType() == typeof(T)         : count: 500000   41.598 +- 3%    msec
(anObj is T)                         : count: 500000   45.505 +- 1%    msec

As you can see from these updated results for .NET 4.0, the x64 VM is now comparable to the x86 VM, largely because the x86 VM appears to be slower than in .NET 2.0. The advantage of RuntimeTypeHandle in .NET 2.0 is completely gone, and it's now (surprisingly) the slowest means of comparing runtime types. Comparing instances of System.Type appears to be the fastest all around, and doesn't degrade if you're comparing generic type parameters.

Saturday, September 21, 2013

CLR Concurrency: Preventing Torn Reads Without Locks

The CLR's value types are incredibly useful for reducing memory usage of programs, but they have a severe limitation in concurrent scenarios: structs larger than the atomic type on a given machine can suffer from torn reads.

Most typical applications won't encounter this because their concurrent accesses, both reads and writes, are protected by locks. However, there are some scenarios where locks just aren't viable for various reasons. For instance, if a few shared variables are read an order of magnitude more often than they're written, then all the lock contention is 90% wasted work among readers that aren't performing any updates.

In principle, the lock is really there to permit only one writer to modify the variable at a time. This means we can possibly use some other signalling mechanism to notify readers that a write is taking place, or has taken place. Ideally, this mechanism shouldn't cause contention among readers thus permitting more read parallelism.

The typical abstraction for this is a reader-writer lock (rwlock). Basically, a number of concurrent readers are permitted to access a resource protected by the rwlock, and writers have to request access and wait until all readers are done. Then the writer proceeds, and all readers must wait until the writer is finished. Unfortunately, rwlocks aren't all they're cracked up to be. Most of the problems stem from the fact that any amount of shared state will inherently limit scalability, but part of the problem is also because readers are performing writes, and thus they are introducing unnecessary contention.

Turns out, it's possible to solve this contention issue using only an additional piece of data: a version number. Writers still coordinate via a standard lock, and when a writer enters the critical section, it increments a version number and executes a memory barrier. The version number is now odd, which indicates that a write is in progress. Then the writer writes to the variable, executes another barrier, then increments the version number again. The version number is now even, indicating that no write is in progress:

public static void Write<T>(
    ref T location,
    ref T value,
    ref int version,
    object writeLock)
{
    lock(writeLock)
    {
        ++version;              // ++version odd: write in progress
        Thread.MemoryBarrier(); // ensure increment complete before update
        location = value;
        Thread.MemoryBarrier(); // ensure update complete before increment
        ++version;              // ++version even: write complete
    }
}

Readers instead only consult the version number to check whether a write is in progress, or whether a write has transpired since we first started the read. We read the version number into a local called 'old', and then spin until the version number is even, which indicates that no write is in progress. Then we read the value from 'location' into a local.

However, a write could have occurred since we exited the loop checking for an odd version number. So we read the version number again and compare it against 'old'. If it differs, that means a write occurred while we were reading, possibly corrupting our read. In that case, we abort and retry the whole process from the beginning. If the version number matches 'old', then the value we read is correct, and we can safely return it [1]:

public static T Read<T>(ref T location, ref int version)
{
    T x;
    int old;
    do
    {
        // loop until version is even = no write in progress
        do
        {
            old = version;
            if (0 == (old & 0x01)) break; // odd version means write in progress
            Thread.MemoryBarrier();       // read(version) from memory
        } while (true);
        x = location;                     // read value from location
        Thread.MemoryBarrier();           // read(version) from memory
    } while (version != old);
    // if version after read == old, no concurrent write
    return x;
}

So we've achieved our goal: we have fully concurrent reads that don't contend for resources, and that don't block writers. I've just included these two atomic read/write functions, and some useful variants, into Sasa's Atomics class.

Turns out this approach isn't new, and Duffy covered a similar approach in a later blog post which I only came across after writing this piece. He rightly points out that these are the sort of techniques employed in software transactional memory (STM), whereby we do as much as possible optimistically, then validate at the end that nothing untoward happened (like a write when we weren't expecting it). If the unexpected happens, we "abort" and retry. As pointed out in the comments to that post, this is essentially the seqlock locking mechanism as used in the Linux kernel.

I don't anticipate using this too often, but I wouldn't have included it in Sasa if it didn't have an important application within Sasa itself. Sasa.Reactive is the assembly that provides reactive variables that can be concurrently read, but only updated by a single writer. I designed the atomic read/write functions above while refactoring the Sasa.Reactive implementation to be more robust, yet more conservative in resource use. These atomic read/write functions allow concurrent readers to safely obtain a snapshot of a reactive variable's value, without resorting to storing values in atomic types, like a reference type.

Hopefully others will find it useful as well. If anyone spots a problem in the above algorithm please do let me know! The CLR provides a relatively strong memory model, so I'm pretty sure I can eliminate one memory barrier in the atomic write function, but I'd love to have some other input. Concurrency is hard after all!

Edit: Brian Gideon helpfully ran some tests of his own that support the correctness of these operations, and their performance benefits over both locking, and interlocked operations.

[1] Technically, the version number could have been incremented so many times that it wrapped around until it matches the saved 'old' value, but IMO it's exceedingly unlikely that 232 writes occurred while we were trying to read a single variable.

Friday, September 20, 2013

On Confluence and Type Parameter Unification in C#

Awhile back I had written about a a type unification nuisance I had run into. In a nutshell, the problem occurs when a class with two type parameters tries to implement the same interface twice, once for each type parameter:

// compiler error:
// 'Foo<T0,T1>' cannot implement both 'IEnumerable<T0>' and
// 'IEnumerable&;lt;T1>' because they may unify for some type
// parameter substitutions
public class Foo<T0, T1> : IEnumerable<T0>, IEnumerable<T1>
{
}

As Doug McClean pointed out in the comments, the reason behind this error is because the two implementations of the interfaces may not be confluent, ie. behaviourally identical, in which case there's no legitimate way to choose between the two.

The application I had in mind at the time used marker interfaces, ie. interfaces with no methods or properties, so they were guaranteed to be confluent. I also had a sneaking suspicion that C# already permitted this structure elsewhere, but they just try to superficially enforce this rule like some other annoying aesthetic constraints.

This turns out to be exactly the case, and it is possible to implement the same interface twice for two type parameters. All you need to do is implement the interfaces at two separate classes in the same inheritance hierarchy, and C# lets this pass with nary a whimper. Here's the sample code:

public interface IFoo<T>
{
    void Bar(T value);
}
public abstract class FooBase<T0, T1> : IFoo<T1>
{
    public void Bar(T1 value)
    {
        Console.WriteLine("T1 Bar");
    }
}
public sealed class Foo<T0, T1> : FooBase<T0, T1>, IFoo<T0>
{
    public void Bar(T0 value)
    {
        Console.WriteLine("T0 Bar");
    }
}
public static class Example
{
    public static void Main(string[] args)
    {
        var core = new Foo<int, int>();
        var ifoo = core as IFoo<int>;
        var foob = core as FooBase<int, int>;
        var ifoo2 = foob as IFoo<int>;

        core.Bar(2);  // output: T0 Bar
        ifoo.Bar(2);  // output: T0 Bar
        foob.Bar(2);  // output: T1 Bar
        ifoo2.Bar(2); // output: T0 Bar
    }
}

So reduction is still confluent because all views of IFoo<T> go through the most recent implementation in the inheritance hierarchy. The only way to call the Bar method on FooBase is by explicitly casting to an instance of FooBase and invoking Bar.

This recently bit me since I was implementing an IObserver<T> that was observing two differently typed streams, but because the interfaces were declared at different levels of the inheritance hierarchy, the compiler never complained. Not very common I agree, but for consistency, I'd suggest that either this structure too be ruled out, or that type unification on type parameters be permitted via some convention just like it's allowed via inheritance. For instance, select the implementation for the first (or last) type parameter.

Tuesday, September 17, 2013

Combinator Calculus EDSLs in C#

A bit of departure from my usual practical posts in C#, I decided to try my hand at an abstract computer science topic. I've become interested in concatenative programming, and combinator calculi are right up this alley.

The quintessential combinator calculus is the well-known SKI calculus, which features two core combinators, S and K, and one optional combinator I. SKI is quite simple, and it's pretty straightforward to implement as an EDSL. I suspect the key stumbling block for most programmers unfamiliar with functional programming will be the pervasive partial application inherent to combinators.

Partial Application

Partial application is a pretty simple concept for anyone whose done even a little programming with LINQ. C#/.NET features first-class functions called "delegates" with types like:

public delegate T2 Func<T0, T1, T2>(T0 arg0, T1 arg1);

That's the type declaration for a delegate that accepts two arguments of types T0 and T1, and returns a value of type T2. Partial application is simply an operation that applies a function to only some of its arguments, with the result of the partial application being another function that accepts the remaining arguments.

We can simulate a partial application operation in any language featuring first-class functions as follows:

public static class Function
{
  public static Func<T1, T2> Partial<T0, T1, T2>(Func<T0, T1, T2> func,
                                                 T0 arg0)
  {
    return arg1 => func(arg0, arg1);
  }
  public static Func<T0, T2> Partial<T0, T1, T2>(Func<T0, T1, T2> func,
                                                 T1 arg1)
  {
    return arg0 => func(arg0, arg1);
  }
}

The above code sample simulates partial application for the two-arg delegate. For every delegate type accepting arguments of types T0, T1, ... Tn, partial application requires a set of Function.Partial overloads for partially applying 1 argument, 2 arguments, 3 arguments, ..., up to n-1 arguments. This reduces to the formula:

(n choose 1) + (n choose 2) + ... + (n choose n-1) = 2n - 2

Normally, partial application is provided by a language's compiler, so all these overloads aren't typically required in a language that has strong support for functional programming.

There is also another way to support something like partial application in a language without needing copious overloads: currying.

Aside: Currying

Most .NET languages feature uncurried functions by default, which is to say that functions can take multiple arguments (F# being the only exception I'm aware of). Curried functions only ever take a single argument, and return a single value, no exceptions. Functions that accept multiple arguments can be simulated by functions that return other single-argument functions that encapsulate the previous arguments.

For instance, consider simple integer addition, which can be represented by the following delegate:

Func<decimal, decimal, decimal> add = decimal.Add;

If C# featured curried functions, this would instead look like:

Func<decimal, Func<decimal, decimal>> add = decimal.Add;

As you can see, add has now become a function that only accepts a single argument, and returns a single argument: it accepts an argument of type decimal, and returns a function that accepts a decimal argument and returns a decimal. In abstract, all N-argument function applications are then a series of N-1 partial applications terminated by a final application that invokes the underlying operation:

Func<decimal, Func<decimal, decimal>> add = decimal.Add;
Func<decimal, decimal> add1 = add(1M);
Func<decimal, decimal> add2 = add(2M);
Func<decimal, decimal> add3 = add(3M);
...
Console.WriteLine(add2(0));   // output: 2
Console.WriteLine(add(2)(0)); // output: 2

Currying is a pretty powerful concept because it simplifies functions into a standard form, taking and returning single-values, which allows easy composition of functions in higher-order patterns, like we find in LINQ.

Languages with currying by default avoid all the above parentheses by simply having whitespace as application. For instance, the above in F# is something like:

let add = decimal.Add in
let add1 = add 1 in
let add2 = add 2 in
let add3 = add 3 in
...
Console.WriteLine (add2 0);
Console.WriteLine (add 2 0);;

This syntax closely matches the concepts you'll find in combinator calculi like the SKI calculus.

The SKI Calculus

A minimal understanding of partial application is needed in order to understand how to implement the SKI calculus in C#. Basically, you can view the S, K and I combinators either like curried functions, or like operators supporting partial application by default. In abstract, the SKI combinators are described as:

let x,y,z be variables representing any subexpression in the SKI calculus:

I(x)     = x
K(x,y)   = x
S(x,y,z) = x(z)(y(z))

Basically, I is a function that accepts one argument and simply returns it. K is a function that accepts two arguments, and simply returns the first argument. S is a function accepting 3 arguments, and first applies the first argument to the third, expecting it to return a function of some sort, and then applies that function to the result of applying the second argument to the third argument.

It's important to note here that all function applications above can be partial applications. So you can have a legitimate expression like "S x", which supplies S with only one argument, even though S takes 3 arguments; it's simply a partial application. The result of "S x" is then a function that accepts at least two more arguments.

This is where currying comes in handy, because partial application semantics is pervasive. The SKI calculus in a language with curried functions would look like:

let x,y,z be variables representing any subexpression in the SKI calculus:

I x     = x
K x y   = x
S x y z = x z (y z)

This is how the raw SKI calculus is generally presented in the academic literature.

Given that the SKI calculus supports pervasive partial application, we need to create a class hierarchy that represents the S, K and I terms, and all possible partial applications. We can do this naively in standard object-oriented fashion with the following set of classes:

public abstract class SK
{
    public abstract SK Apply(SK term);
}
public class I : SK
{
    internal static I instance = new I();
    public override SK Apply(SK term)
    {
        return term;
    }
}
public class K : SK
{
    internal static K instance = new K();
    public override SK Apply(SK term)
    {
        return new K1 { First = term };
    }
}
public class S : SK
{
    static internal S instance = new S();
    public override SK Apply(SK term)
    {
        return new S1 { First = term };
    }
}

This is a standard object-oriented representation of curried SKI combinators. Each combinator takes a single argument of type SK, and returns a single value of type SK. This is all well and good for combinators like I, which only take a single argument, but K and S take more. This where partial application comes in. We return intermediate terms K1 and S1 for partial applications of K and S, respectively:

public class K1 : SK
{
    public SK First { get; set; }
    public override SK Apply(SK term)
    {
        return First;
    }
}
public class S1 : SK
{
    public SK First { get; set; }
    public override SK Apply(SK term)
    {
        return new S2 { First = First, Second = term };
    }
}

Since S1 requires one further argument before the operation can be applied, it returns another intermediate value representing the partial application of two arguments to S:

public class S2 : SK
{
    public SK First { get; set; }
    public SK Second { get; set; }
    public override SK Apply(SK term)
    {
        return First.Apply(term).Apply(Second.Apply(term));
    }
}

And thus we have the full SKI combinator calculus. An example SKI term:

var SKSK = new S().Apply(new K()).Apply(new S()).Apply(new K());

This is pretty awkward, and we can do much better. You might have noticed that the core S, K, I classes were singletons, and that's because we only ever need the one instance since they contain no state. Combined with some additional properties on the base SK class, embedded SKI expressions can be pretty convenient:

public abstract class SK
{
    public abstract SK Apply(SK term);
    public SK S
    {
        get { return Apply(Combinators.S.instance); }
    }
    public SK I
    {
        get { return Apply(Combinators.I.instance); }
    }
    public SK K
    {
        get { return Apply(Combinators.K.instance); }
    }
}

The previous example then becomes:

var SKSK = new S().K.S.K;

The SKI calculus is actually Turing complete, so any computable function can be computed by some SKI expression. For instance, here's a simple infinite loop:

var SII_SII = new S().I.I.Apply(new S().I.I); // loops forever

The Functional Representation

The above is a canonical object-oriented representation of terms. C# provides us with some other options though, including a more functional encoding. Simply stated, we convert the class hierarchy into enums representing a disjoint union, and evaluation is a switch over the possible cases instead of a method dispatch:

public enum SkiCombinators
{
    I, K, S
}
public struct SK
{
    SkiCombinators op;
    SK[] args;
    public SK Apply(SK term)
    {
        switch (op)
        {
            case SkiCombinators.I:
                return term;
            case SkiCombinators.K:
                return args != null
                     ? args[0]
                     : new SK { op = op, args = new[] { term } };
            case SkiCombinators.S:
                return args == null      ? new SK { op = op, args = new[] { term } }:
                       args.Length == 1  ? new SK { op = op, args = new[] { args[0], term } }:
                                           args[0].Apply(term).Apply(args[1].Apply(term));
            default: throw new InvalidOperationException("Unknown combinator.");
        }
    }
}

Combinators are here represented in a form that's closer to an efficient instruction encoding for an abstract machine. We have an opcode representing the combinator, and we have a list of arguments represented as an array. The I combinator only ever gets 0 or 1 element arrays. The K combinator only ever gets 0, 1 or 2 element arrays, the first two of which represent partial applications. And finally, the S combinator only ever has 0, 1, 2, or 3 element arrays.

I tend to find this representation more convenient to work with since the evaluation function is defined concisely in one place. It's also a small step from this encoding, to an opcode array representing an instruction machine for an SKI machine (I'll write about it if there's interest).

We can make embedded SKI expressions convenient with the following helper properties:

public struct SK
{
    ...
    public static SK Begin_I
    {
        get { return new SK { op = SkiCombinators.I }; }
    }
    public static SK Begin_K
    {
        get { return new SK { op = SkiCombinators.K }; }
    }
    public static SK Begin_S
    {
        get { return new SK { op = SkiCombinators.S }; }
    }

    public SK S
    {
        get { return Apply(Begin_S); }
    }
    public SK I
    {
        get { return Apply(Begin_I); }
    }
    public SK K
    {
        get { return Apply(Begin_K); }
    }
}

The previous SKI examples now look like this:

var SKSK = SK.Begin_S.K.S.K;
var SII_SII = SK.Begin_S.I.I.Apply(SK.Begin_S.I.I);

This representation is also similar to the algebraic representation you'd find in a functional language like F#. In F#, this would look something like:

type ski = S | K | I | K1 of ski | S1 x of ski | S2 of ski * ski
let apply e arg0 = match e with
    S  -> S1 arg0
  | K  -> K1 arg0
  | I  -> arg0
  | K1 x -> x
  | S1 x -> S2 arg0 x
  | S2 x y -> x arg0 (y arg0);;

This is the direct algebraic equivalent of the first object-oriented representation of the SKI calculus, with values representing the partial applications of combinators.

What's It All For?

This was really just a bit of fun in exploring abstract computer science concepts with C#, and hopefully can serve as a rudimentary introduction to convenient functional encodings for language terms. Combinator calculi are a good introduction because they are simple and concise, yet computationally complete.

Despite being simple and studied for a long time, there's still interesting work being done with these calculi. For instance, a great feature of some dynamically typed languages which still isn't well understood in typed terms, is metacircularity. Dynamically typed languages can represent native programs as values, can inspect those values, and can execute those values (Lisp is the canonical example).

However, doing this in a statically typed language is still an open problem... except for typed combinator calculi! I believe the SFB combinator calculus is the first example of a statically typed language that can inspect and evaluate its own programs, and it doesn't need anything beyond the types of System F, which we also have in C#.

Like the SK calculus, the SFB calculus can represent lambda abstraction and application, and so in theory, we actually have a metacircular statically typed language! The downside is that the metacircularity is at the level of the combinator terms, not at the lambda abstraction level, so it's allegedly a little awkward to use for programs written using lambda abstraction.

Still, concatenative languages exist, and some are quite popular, like Factor, so perhaps there's some application of this approach to concatenative languages. My next post will probably be a C# encoding of the SFB calculus, which I've already written, along with some interesting terms showing its expressive power.