I've written quite a few times about my experiments with the CLR type system [1, 2, 3]. After much exploration and reflection, I had devised a pretty good approach to encoding ML-style modules and abstracting over type constructors in C#.
A recent question on Stack Overflow made me realize that I never actually explained this technique in plain English.
The best encoding of ML modules and type constructor polymorphism requires the use of partly safe casting.
I'll use the tagless interpreter from Fig. 2 of tagless staged interpreters as a concrete example:
In the translation, I omit the 'c type parameter used in OCaml. The type of the expression representation, 'dv, becomes T in C#:
The final translation will look something like:
The implementation undergoes a similar translation:
The final mapping looks like:
Programs written against tagless interpreters are wrapped in functors in order to properly abstract over the interpreter implementation. As mentioned before, modules and signatures are effectively first-class values in this encoding, so a functor simply becomes a function:
maps to:
The brand/ISymantics type could also be lifted to be a generic class parameter to make it syntactically closer to how it looks in the paper, but the difference is not important.
You can now run EX.Test1 with any conforming implementation of ISymantics, and the type system will prevent you from mixing representations of different implementations just as it would in ML, because the brands will not match. The only way to trigger a type error due to the downcast, is if the client implements his own Repr<T, B> supplying R for the brand, then passing the custom Repr type in to a method on ISymantics<R>. In such cases the client deserves an error.
I think this is a fairly reasonable trade off all things considered. Of course, it would be preferable if the CLR could just support type constructor polymorphism natively. And while all my wishes are coming true, can I have all of these changes too?
A recent question on Stack Overflow made me realize that I never actually explained this technique in plain English.
The best encoding of ML modules and type constructor polymorphism requires the use of partly safe casting.
- An ML signature maps to a C# interface with a generic type parameter called a "brand". The brand names the class that implements the interface, ie. the module implementation.
- An ML module maps to a C# class. If the module implements a signature, then it implements the corresponding interface and specifies itself as the signature's brand.
- Since classes and interfaces are first-class values, an ML functor also maps to a class.
- An ML type component maps to an abstract class that shares the same brand as the module. This effectively ties the the module data representation and the module implementation together at the interface boundary, and makes the necessary casting partly safe.
I'll use the tagless interpreter from Fig. 2 of tagless staged interpreters as a concrete example:
(* Fig. 2 *)
module type Symantics = sig
type ('c, 'dv) repr
val int : int -> ('c, int) repr
val bool: bool -> ('c, bool) repr
val lam : (('c, 'da) repr -> ('c, 'db) repr) ->
('c, 'da -> 'db) repr
val app : ('c, 'da -> 'db) repr -> ('c, 'da) repr ->
('c, 'db) repr
val fix : ('x -> 'x) -> (('c, 'da -> 'db) repr as 'x)
val add : ('c, int) repr -> ('c, int) repr ->
('c, int) repr
val mul : ('c, int) repr -> ('c, int) repr ->
('c, int) repr
val leq : ('c, int) repr -> ('c, int) repr ->
('c, bool) repr
val if_ : ('c, bool) repr ->
(unit -> 'x) -> (unit -> 'x) ->
(('c, 'da) repr as 'x)
end
In the translation, I omit the 'c type parameter used in OCaml. The type of the expression representation, 'dv, becomes T in C#:
- The module signature:
module type Symantics = sig
maps tointerface ISymantics<B> where B : ISymantics<B>
(B is the module's Brand) - The module's inner type declaration:
type ('c, 'dv) repr
maps toabstract class Repr<T, B> where B : ISymantics<B>
(B is the module's Brand) - Each signature function maps to a method on ISymantics, ie.
val int : int -> ('c, int) repr
maps toRepr<int, B> Int(int value)
The final translation will look something like:
// type component
abstract class Repr<T, B> where B : ISymantics<B> { }
// module signature
interface ISymantics<B> where B : ISymantics<B>
{
Repr<int, B> Int(int i);
Repr<int, B> Add(Repr<int, B> left, Repr<int, B> right);
...
}
The implementation undergoes a similar translation:
- The module declaration:
module R = struct
maps tosealed class R : ISymantics<R>
(R implements ISymantics and provides itself as the type brand) - The module's inner type declaration:
type ('c,'dv) repr = 'dv
maps tosealed class ReprR<T> : Repr<T, R>
(the concrete representation is a sealed class that inherits from Repr, and supplies R as the brand, effectively typing it to the R implementation)
The final mapping looks like:
(* Section 2.2 *)maps to:
module R = struct
type ('c,'dv) repr = 'dv (* no wrappers *)
let int (x:int) = x
let add e1 e2 = e1 + e2
...
end
// concrete type component for the interpreter
// representation
sealed class ReprR<T> : Repr<T, R>
{
internal T value;
}
sealed class R : ISymantics<R>
{
public Repr<int, R> Int(int i)
{
return new ReprR<int> { value = i };
}
public Repr<int, R> Add(Repr<int, R> left,
Repr<int, R> right)
{
var l = left as ReprR<int>; // semi-safe cast
var r = right as ReprR<int>;// semi-safe cast
return new ReprR<int> { value = l.value + r.value; }; }
}
}
Programs written against tagless interpreters are wrapped in functors in order to properly abstract over the interpreter implementation. As mentioned before, modules and signatures are effectively first-class values in this encoding, so a functor simply becomes a function:
(* Fig. 3 *)
module EX(S: Symantics) = struct
open S
let test1 () = app (lam (fun x -> x)) (bool true)
...
end
maps to:
public static class EX
{
public static Repr<bool, B> Test1<B>(ISymantics<B> s)
{
return s.App(s.Lam(x => x), s.Bool(true));
}
}
The brand/ISymantics type could also be lifted to be a generic class parameter to make it syntactically closer to how it looks in the paper, but the difference is not important.
You can now run EX.Test1 with any conforming implementation of ISymantics, and the type system will prevent you from mixing representations of different implementations just as it would in ML, because the brands will not match. The only way to trigger a type error due to the downcast, is if the client implements his own Repr<T, B> supplying R for the brand, then passing the custom Repr type in to a method on ISymantics<R>. In such cases the client deserves an error.
I think this is a fairly reasonable trade off all things considered. Of course, it would be preferable if the CLR could just support type constructor polymorphism natively. And while all my wishes are coming true, can I have all of these changes too?
Comments
And, the defining feature of SML/OCaml modules is sharing which you didn't include - and there's good experience that emulating these via abstraction isn't workable in practice (so really this is more like emulating Haskell than ML).
Some of us would really like such things to be possible, but it really does require something fundamentally more powerful that .NET doesn't provide yet (nor Haskell in fact). SML has had this for 20 years, so it can't be long before it becomes possible... :-)
The closest analogue I can think of are interface specifications on type constraints. Consider:
void Foo<T>(T afoo) where T : IFoo, IBar
How would you specify that the parameter implements both IFoo and IBar without that type constraint? You really couldn't. At best, you could write Foo to accept some larger interface type IFooBar that implements both IFoo and IBar:
interface IFooBar : IFoo,IBar {}
void Foo(IFooBar afoo);
But you have to plan for this ahead of time, and if you're interfacing with code that someone else wrote, you're SOL. You have to write a whole new adapter class to wrap the other class.
The type constraints allowed you greater flexibility in composing code, and sharing constraints are almost exactly like the above type constraints in this regard.