Thursday, October 1, 2009

Abstracting over Type Constructors using Dynamics in C#

I've written quite a few times about my experiments with the CLR type system [1, 2, 3]. After much exploration and reflection, I had devised a pretty good approach to encoding ML-style modules and abstracting over type constructors in C#.

A recent question on Stack Overflow made me realize that I never actually explained this technique in plain English.

The best encoding of ML modules and type constructor polymorphism requires the use of partly safe casting.
  1. An ML signature maps to a C# interface with a generic type parameter called a "brand". The brand names the class that implements the interface, ie. the module implementation.

  2. An ML module maps to a C# class. If the module implements a signature, then it implements the corresponding interface and specifies itself as the signature's brand.

  3. Since classes and interfaces are first-class values, an ML functor also maps to a class.

  4. An ML type component maps to an abstract class that shares the same brand as the module. This effectively ties the the module data representation and the module implementation together at the interface boundary, and makes the necessary casting partly safe.

I'll use the tagless interpreter from Fig. 2 of tagless staged interpreters as a concrete example:
(* Fig. 2 *)
module type Symantics = sig
type ('c, 'dv) repr
val int : int -> ('c, int) repr
val bool: bool -> ('c, bool) repr
val lam : (('c, 'da) repr -> ('c, 'db) repr) ->
('c, 'da -> 'db) repr
val app : ('c, 'da -> 'db) repr -> ('c, 'da) repr ->
('c, 'db) repr
val fix : ('x -> 'x) -> (('c, 'da -> 'db) repr as 'x)
val add : ('c, int) repr -> ('c, int) repr ->
('c, int) repr
val mul : ('c, int) repr -> ('c, int) repr ->
('c, int) repr
val leq : ('c, int) repr -> ('c, int) repr ->
('c, bool) repr
val if_ : ('c, bool) repr ->
(unit -> 'x) -> (unit -> 'x) ->
(('c, 'da) repr as 'x)
end

In the translation, I omit the 'c type parameter used in OCaml. The type of the expression representation, 'dv, becomes T in C#:
  1. The module signature:
    module type Symantics = sig
    maps to
    interface ISymantics<B> where B : ISymantics<B>
    (B is the module's Brand)

  2. The module's inner type declaration:
    type ('c, 'dv) repr
    maps to
    abstract class Repr<T, B> where B : ISymantics<B>
    (B is the module's Brand)

  3. Each signature function maps to a method on ISymantics, ie.
    val int : int -> ('c, int) repr
    maps to
    Repr<int, B> Int(int value)

The final translation will look something like:
// type component
abstract class Repr<T, B> where B : ISymantics<B> { }
// module signature
interface ISymantics<B> where B : ISymantics<B>
{
Repr<int, B> Int(int i);
Repr<int, B> Add(Repr<int, B> left, Repr<int, B> right);
...
}

The implementation undergoes a similar translation:
  1. The module declaration:
    module R = struct
    maps to
    sealed class R : ISymantics<R>
    (R implements ISymantics and provides itself as the type brand)

  2. The module's inner type declaration:
    type ('c,'dv) repr = 'dv
    maps to
    sealed class ReprR<T> : Repr<T, R>
    (the concrete representation is a sealed class that inherits from Repr, and supplies R as the brand, effectively typing it to the R implementation)

The final mapping looks like:
(* Section 2.2 *)
module R = struct
type ('c,'dv) repr = 'dv (* no wrappers *)
let int (x:int) = x
let add e1 e2 = e1 + e2
...
end
maps to:
// concrete type component for the interpreter
// representation
sealed class ReprR<T> : Repr<T, R>
{
internal T value;
}
sealed class R : ISymantics<R>
{
public Repr<int, R> Int(int i)
{
return new ReprR<int> { value = i };
}
public Repr<int, R> Add(Repr<int, R> left,
Repr<int, R> right)
{
var l = left as ReprR<int>; // semi-safe cast
var r = right as ReprR<int>;// semi-safe cast
return new ReprR<int> { value = l.value + r.value; }; }
}
}

Programs written against tagless interpreters are wrapped in functors in order to properly abstract over the interpreter implementation. As mentioned before, modules and signatures are effectively first-class values in this encoding, so a functor simply becomes a function:
(* Fig. 3 *)
module EX(S: Symantics) = struct
open S
let test1 () = app (lam (fun x -> x)) (bool true)
...
end

maps to:
public static class EX
{
public static Repr<bool, B> Test1<B>(ISymantics<B> s)
{
return s.App(s.Lam(x => x), s.Bool(true));
}
}

The brand/ISymantics type could also be lifted to be a generic class parameter to make it syntactically closer to how it looks in the paper, but the difference is not important.

You can now run EX.Test1 with any conforming implementation of ISymantics, and the type system will prevent you from mixing representations of different implementations just as it would in ML, because the brands will not match. The only way to trigger a type error due to the downcast, is if the client implements his own Repr<T, B> supplying R for the brand, then passing the custom Repr type in to a method on ISymantics<R>. In such cases the client deserves an error.

I think this is a fairly reasonable trade off all things considered. Of course, it would be preferable if the CLR could just support type constructor polymorphism natively. And while all my wishes are coming true, can I have all of these changes too?

6 comments:

Anton Tayanovskyy (toyvo) said...

Hi, thank you for taking time to explain your encoding in the article! I find it very helpful. By the way, I am the poster of that StackOverflow question. I have turned the SO question into a wiki, which you are welcome to edit to link to your article. Thanks again.

RD1 said...
This comment has been removed by the author.
RD1 said...

You've kinda hit the tail on the head - you can't quite do this with .NET and have it be statically checked.

And, the defining feature of SML/OCaml modules is sharing which you didn't include - and there's good experience that emulating these via abstraction isn't workable in practice (so really this is more like emulating Haskell than ML).

Some of us would really like such things to be possible, but it really does require something fundamentally more powerful that .NET doesn't provide yet (nor Haskell in fact). SML has had this for 20 years, so it can't be long before it becomes possible... :-)

Sandro Magi said...

Actually, Haskell does support type constructor polymorphism. They wouldn't be able to support monads otherwise.

Qwertie said...

Since I've never used ML, I would be curious to learn what is meant by "sharing specifications", what real-world problems it can solve, and how one would have to change the .NET type system to support it as well as ML supports it.

Sandro Magi said...

It's a little hard to describe sharing constraints since .NET doesn't have anything exactly comparable. These constraints are a way to control the propagation of type information between modules that would otherwise be completely encapsulated.

The closest analogue I can think of are interface specifications on type constraints. Consider:

void Foo<T>(T afoo) where T : IFoo, IBar

How would you specify that the parameter implements both IFoo and IBar without that type constraint? You really couldn't. At best, you could write Foo to accept some larger interface type IFooBar that implements both IFoo and IBar:

interface IFooBar : IFoo,IBar {}
void Foo(IFooBar afoo);

But you have to plan for this ahead of time, and if you're interfacing with code that someone else wrote, you're SOL. You have to write a whole new adapter class to wrap the other class.

The type constraints allowed you greater flexibility in composing code, and sharing constraints are almost exactly like the above type constraints in this regard.