Friday, December 28, 2007

ML Modules in C#

I've written about the limitations of C#'s equational constraints before. Truth is, I now believe that any such limits can be circumvented by a relatively simple translation. The result is less "object-oriented", as it requires a set of cooperating objects instead of being encapsulated in a single object.

Let's take a simple, unsafe list flattening operation described in Generalized Algebraic Data Types and Object-Oriented Programming (GADTOOP). This can be expressed in OCaml as:
let List = struct 
type 'a t = Nil | Cons of 'a * 'a t
let append l a = Cons(a, l)
let flatten la =
Cons(a, l) -> append a (flatten l)
| Nil -> Nil
end

The argument to flatten, la, is a list of lists of type 'a. However, there is no way to express this in C# without unrestricted equational constraints as I described earlier. Here is the translation to C# from GADTOOP:
public abstract class List<T> {...
public abstract List<T> Append(List<T> that);
public abstract List<U> Flatten<U>();
}
public class Nil<A> : List<A> {...
public override List<U> Flatten<U>()
{ return new Nil<U>(); }
}
public class Cons<A> : List<A> {...
A head; List<A> tail;
public override List<U> Flatten<U>()
{ Cons<List<U>> This = (Cons<List<U>>) (object) this;
return This.head.Append(This.tail.Flatten<U>()); }
}

Note how Flatten requires an unsafe, ugly cast to and from object to get around C#'s type limitations. However, there is a safe way. Here is the translation:
static class List
{
public abstract class t<A> {}
public sealed class Nil<A> : t<A> {}
public sealed class Cons<A> : t<A> {
internal A head; internal t<A> tail;
}
public static t<A> Append(t<A> l, A a)
{ return new Cons<A>(a, l); }
public static t<A> Flatten<A>(t<List<A>> l)
{ return l.head is Nil ? new Nil<A>() : Append<A>(Flatten<A>(l.tail), l.head); }
}

Basically, the step-wise translation I propose:
  1. move the generic class I into a nested generic class O.I
    class I<T> {}
    becomes
    class O {
    class I<T> {}
    }

  2. move the instance methods of that generic class into generic methods of the enclosing class, I.m -> O.m
    class I<T> {
    I<T> m<U>(U u) { ... }
    }
    becomes
    class O {
    class I<T> {}
    I<T> m<T, U>(I<T> i, U u) { ... }
    }

  3. mark all relevant state of I "internal", so the outer class O can access it.
    class I<T> {
    private T value;
    public T get() { return value; }
    }
    becomes
    class O {
    class I<T> { internal T value; }
    public T get(I<T> i) { return i.value; }
    }

  4. now you can express arbitrary constraints on the type structure in the methods of O.
    class O {
    class I<T> {}
    I<T> m2<T, U>(I<I<T>> i, I<I<U>> u) { ... }
    ...
    }

Because the external methods do not depend on an implicit 'this' argument, we no longer need equational constraint refinements to express the required type structure. Ordinary C# types suffice!

If you squint a little more closely, you'll also note a much closer symmetry in this version of the code and the OCaml code. In fact, this translation essentially builds ML modules in C#!

Like all widely-used module systems, it's pretty clear that these modules are "second-class". Only rebinding the name is possible, via C#'s "using" directive, and everything is resolved and fixed at compile-time. By this, I mean that you can import different list implementations and it will compile cleanly as long as the other list implementation defines the same operations with the same type signatures:
using L = List;
...
L.Flatten(l);
...
Or:
using L = AnotherList;
...
L.Flatten(l);
...

The compile-time restriction is primarily due to the declaration as a static class. Making the class non-static permits runtime binding of concrete implementations to signatures, so it's a little more flexible, and just as safe. Loosening this restriction may also make runtime "functors" possible.

I've used this pattern to complete Orc.NET, because the 'where' combinator required an inexpressible dependency. You can see my use in the Orc interpreter. The "Orc" object in Interp.cs is essentially an "expression builder", and I suspect that all such "builder" implementations are really ML modules at their core.

An open question is the interaction of inheritance with such modules. Seems like inheritance is a particular type of functor from structure S to S.

In any case, if you need type constraints which are inexpressible in C#, then make them ML modules using the above translation, and add object-orientedness back in incrementally. On a final note, I find it amusing that OO languages must resort to functional techniques to resolve fundamental OO limitations. I'd much prefer if we could just use functional languages instead and forgo all the hassle. ;-)

2 comments:

Harmless said...

F# uses a similar translation.

http://web.mit.edu/fsharp_v1.1.12/FSharp-1.1.12.3/manual/export-interop.html

Sandro Magi said...

Thanks for the pointer. I looked at some F# a little while ago, so perhaps some of what I read percolated unconsciously into this translation.