Thursday, October 1, 2009

Abstracting over Type Constructors using Dynamics in C#

I've written quite a few times about my experiments with the CLR type system [1, 2, 3]. After much exploration and reflection, I had devised a pretty good approach to encoding ML-style modules and abstracting over type constructors in C#.

A recent question on Stack Overflow made me realize that I never actually explained this technique in plain English.

The best encoding of ML modules and type constructor polymorphism requires the use of partly safe casting.
  1. An ML signature maps to a C# interface with a generic type parameter called a "brand". The brand names the class that implements the interface, ie. the module implementation.

  2. An ML module maps to a C# class. If the module implements a signature, then it implements the corresponding interface and specifies itself as the signature's brand.

  3. Since classes and interfaces are first-class values, an ML functor also maps to a class.

  4. An ML type component maps to an abstract class that shares the same brand as the module. This effectively ties the the module data representation and the module implementation together at the interface boundary, and makes the necessary casting partly safe.

I'll use the tagless interpreter from Fig. 2 of tagless staged interpreters as a concrete example:
(* Fig. 2 *)
module type Symantics = sig
type ('c, 'dv) repr
val int : int -> ('c, int) repr
val bool: bool -> ('c, bool) repr
val lam : (('c, 'da) repr -> ('c, 'db) repr) ->
('c, 'da -> 'db) repr
val app : ('c, 'da -> 'db) repr -> ('c, 'da) repr ->
('c, 'db) repr
val fix : ('x -> 'x) -> (('c, 'da -> 'db) repr as 'x)
val add : ('c, int) repr -> ('c, int) repr ->
('c, int) repr
val mul : ('c, int) repr -> ('c, int) repr ->
('c, int) repr
val leq : ('c, int) repr -> ('c, int) repr ->
('c, bool) repr
val if_ : ('c, bool) repr ->
(unit -> 'x) -> (unit -> 'x) ->
(('c, 'da) repr as 'x)
end

In the translation, I omit the 'c type parameter used in OCaml. The type of the expression representation, 'dv, becomes T in C#:
  1. The module signature:
    module type Symantics = sig
    maps to
    interface ISymantics<B> where B : ISymantics<B>
    (B is the module's Brand)

  2. The module's inner type declaration:
    type ('c, 'dv) repr
    maps to
    abstract class Repr<T, B> where B : ISymantics<B>
    (B is the module's Brand)

  3. Each signature function maps to a method on ISymantics, ie.
    val int : int -> ('c, int) repr
    maps to
    Repr<int, B> Int(int value)

The final translation will look something like:
// type component
abstract class Repr<T, B> where B : ISymantics<B> { }
// module signature
interface ISymantics<B> where B : ISymantics<B>
{
Repr<int, B> Int(int i);
Repr<int, B> Add(Repr<int, B> left, Repr<int, B> right);
...
}

The implementation undergoes a similar translation:
  1. The module declaration:
    module R = struct
    maps to
    sealed class R : ISymantics<R>
    (R implements ISymantics and provides itself as the type brand)

  2. The module's inner type declaration:
    type ('c,'dv) repr = 'dv
    maps to
    sealed class ReprR<T> : Repr<T, R>
    (the concrete representation is a sealed class that inherits from Repr, and supplies R as the brand, effectively typing it to the R implementation)

The final mapping looks like:
(* Section 2.2 *)
module R = struct
type ('c,'dv) repr = 'dv (* no wrappers *)
let int (x:int) = x
let add e1 e2 = e1 + e2
...
end
maps to:
// concrete type component for the interpreter
// representation
sealed class ReprR<T> : Repr<T, R>
{
internal T value;
}
sealed class R : ISymantics<R>
{
public Repr<int, R> Int(int i)
{
return new ReprR<int> { value = i };
}
public Repr<int, R> Add(Repr<int, R> left,
Repr<int, R> right)
{
var l = left as ReprR<int>; // semi-safe cast
var r = right as ReprR<int>;// semi-safe cast
return new ReprR<int> { value = l.value + r.value; }; }
}
}

Programs written against tagless interpreters are wrapped in functors in order to properly abstract over the interpreter implementation. As mentioned before, modules and signatures are effectively first-class values in this encoding, so a functor simply becomes a function:
(* Fig. 3 *)
module EX(S: Symantics) = struct
open S
let test1 () = app (lam (fun x -> x)) (bool true)
...
end

maps to:
public static class EX
{
public static Repr<bool, B> Test1<B>(ISymantics<B> s)
{
return s.App(s.Lam(x => x), s.Bool(true));
}
}

The brand/ISymantics type could also be lifted to be a generic class parameter to make it syntactically closer to how it looks in the paper, but the difference is not important.

You can now run EX.Test1 with any conforming implementation of ISymantics, and the type system will prevent you from mixing representations of different implementations just as it would in ML, because the brands will not match. The only way to trigger a type error due to the downcast, is if the client implements his own Repr<T, B> supplying R for the brand, then passing the custom Repr type in to a method on ISymantics<R>. In such cases the client deserves an error.

I think this is a fairly reasonable trade off all things considered. Of course, it would be preferable if the CLR could just support type constructor polymorphism natively. And while all my wishes are coming true, can I have all of these changes too?