Saturday, January 19, 2008

An Almost Type-Safe General Monad in C#, aka how to Abstract over Type Constructors using Dynamics

Extending the work in my last post, I've developed a way to express an almost type-safe, general monad in C#. Similar to my module translation, the single monad object becomes a pair of co-operating objects, only one of which the monad implementor must define. Since C# cannot abstract over type constructors, I had to exploit the only feature that could accomodate the flexibility I needed: C#'s dynamic typing.
// The Monad object, indexed by a singleton type that implements the
// monad operations.
public sealed class Monad<M, T>
where M : struct, IMonadOps<M> { ... }

// An object that implements operations on the monad's encapsulated
// state.
public interface IMonadOps<M>
where M : struct, IMonadOps<M>
{
/// Return the encapsulated state for the monad's zero value.
object Zero<T>();

// Return the encapsulated state for the 'unit' operation.
object Unit<T>(T t);

// Perform a bind operation given the monad's encapsulated state,
// and the binding function f.
Monad<M, R> Bind<T, R>(
object state,
Fun<T, Monad<M, R>> f,
Fun<Monad<M, R>, object> from,
Fun<object, Monad<M, R>> to);
}
Looks a bit complicated, I know, but every piece is well-motivated. IMonadOps is fortunately the only interface that new monads must implement. Note the type constraints. The interface and the general monad both have a "phantom" or "witness" type M, constraining the type of the monad operations to the same IMonadOps type that created the monad. This means that the monad is entirely closed to extension and inspection.

Each IMonadOps is effectively a stateless singleton. The constraint to a struct is merely an optimization. Every IMonadOps implementor is actually very similar to the body of an ML module. If I want to invoke Identity.Zero, I can do it thusly:
default(Identity).Zero<int>();
This reveals the magic I used to define the Monad type. Whenever the Monad<M,T> type needs to invoke the underlying monad operations, it invokes the operation on the corresponding singleton M just like the above. Indexing the Monad by an IMonadOps implementation M, is like a linking step between Monad and M.
public sealed class Monad<M, T>
where M : struct, IMonadOps<M>
{
object state;

Monad(object state)
{
this.state = state;
}

// The projection function.
static object get<R>(Monad<M, R> m)
{
return m.state;
}

// The injection function.
static Monad<M, R> ctor<R>(object state)
{
return new Monad<M, R>(state);
}

// The standard 'bind' operation. It dispatches to the
// type-indexed 'bind', defined by M.
public Monad<M, R> Bind<R>(Fun<T, Monad<M, R>> f)
{
return default(M).Bind<T, R>(state, f, get<R>, ctor<R>);
}

// The standard 'unit' operation, injecting a value into
// the monad. It dispatches to the type-indexed unit
// function defined by M.
public static Monad<M, T> Unit(T t)
{
return new Monad<M, T>(default(M).Unit<T>(t));
}

public static Monad<M, T> Zero()
{
return new Monad<M, T>(default(M).Zero<T>());
}
}
You can see the dispatching at work here. All monad operations are dispatched to the "linked" methods of M. In a sense, we have succeeded in abstracting over the concrete implementation of Monad.

Now comes the catch: it's not fully type-safe, because the encapsulated state of the monad must be stored as 'object'. This means that each monad body must ensure it properly casts to and from the appropriate type. This is again due to the type constructor abstraction limitation.

Your first thought might be, why can't the encapsulated state simply be 'T'? Well, if all you wanted was an Identity monad, then that would be fine. But consider the Maybe monad:
public struct Maybe : IMonadOps<Maybe>
{
public object Unit<T>(T t)
{
return new Option<T>(t);
}

public object Zero<T>()
{
return new Option<T>();
}

public Monad<Maybe, R> Bind<T, R>(
object state,
Fun<T, Monad<Maybe, R>> f,
Fun<Monad<Maybe, R>, object> from,
Fun<object, Monad<Maybe, R>> to)
{
Option<T> value = (Option<T>)state;
return (value.HasValue) ? f(value.Value) : Fail<R>();
}

public static Monad<Maybe, T> Fail<T>()
{
return Monad<Maybe, T>.Zero();
}
}
The encapsulated state is an Option of type T, not a T. As another example, the List monad encapsulates a list of T's. Since we can't abstract over the type constructor for the encapsulated state, we thus need to resort to dynamic typing.

Now comes a slightly bizarre part: what are the injection and projection functions for? Well, despite the fact that IMonadOps is the "internal implementation" of Monad, it doesn't have direct access to the monad's internals. Unfortunately, sometimes that access is needed. Consider the List monad:
public struct ListM : IMonadOps<ListM>
{
public object Unit<T>(T t)
{
return List.Cons<T>(t, List.Nil<T>());
}

public object Zero<T>()
{
return List.Nil<T>();
}

public Monad<ListM, R> Bind<T, R>(
object state,
Fun<T, Monad<ListM, R>> f,
Fun<Monad<ListM, R>, object> from,
Fun<object, Monad<ListM, R>> to)
{
return to(
List.MapFlat<T, R>(
state as List.t<T>, delegate(T t)
{
return from(f(t)) as List.t<R>;
}));
}
}
ListM needs access to the private state of the returned list of Monads in order to flatten the list, but that access is not permitted since Monad is a separate, encapsulated type. There is no way to make this state available using inheritance or access modifiers, without also permitting the state to escape inadvertently.

Instead, the Monad provides an injection/projection pair, which are used to construct monad instances when given private state, or read out the private state of a monad instance, respectively. Note that encapsulation is maintained since this ability is granted only to the implementor of a given monad, which is already trusted with its own state.

I suspect there's a more efficient way to share the monad's state, but I'm a little tired from standing on my head all day for C#, so if anyone has any ideas, I welcome them. :-)

While this encoding is less efficient than the one described in my previous post, it's safer in some ways for users monad implementors alike, and I proved to myself that C#'s type system is powerful enough to encode the monad interface if you contort yourself appropriately. This technique for abstracting over type constructors might even be usable in my tagless Orc implementation.

Friday, January 18, 2008

The Worst Monad Tutorial... Except For All Those Others.

I've found other monad tutorials very frustrating. They are typically written in expressive languages with type inference, which permits concise descriptions, but obscures the underlying type structure. I've been struggling with writing something close to a monad in C# for quite some time, simply because none of these tutorials give a sufficiently complete description of a monad's structure. Suprisingly, the Wikipedia page on monads helped clarify what I was missing.

Here is the general structure of a monad all these tutorials use:
-- the type of monad m
type m a = ...

-- return is a type constructor that creates monad instances
return :: a → m a

-- bind is a function that combines a monad instance m a with a
-- computation that produces another monad instance m b from a's
-- to produce a new monad instance m b
bind :: m a → (a → m b) → m b

So the monad type 'm', has a function 'return' that constructs instances of that type, and 'bind' which converts an instance of that monad into another instance of a monad by passing its private state to the provided function. For instance, meet Haskell's Identity monad:
instance Monad Identity where
return a = Identity a -- i.e. return = id
(Identity x) >>= f = f x -- i.e. x >>= f = f x

Looks simple enough. Now meet Haskell's List monad:
instance Monad [] where
bind m f = concatMap f m
return x = [x]
fail s = []

What's not at all obvious from any of the above signatures, or any of the existing tutorials, is that the monad that the function f returns, must be the same monad type. If you're in the list monad, f must return a new instance of the list monad. If you're in the identity monad, f must return a new instance of the identity monad. This simple fact eluded me for quite some time, and explains why the monad interface cannot be expressed in C#. We can still program using monads, but the interface can't be enforced by the type system.

So without further ado, here is the Identity monad in C#:
public class Identity<T> : Monad<T>
{
T value;

public Identity(T t)
{
value = t;
}
public Identity<B> Bind<B>(Fun<T, Identity<B>> f)
{
return f(value);
}
}
The Monad<T> base class is actually empty, so it's just a marker interface. For those of you unfamiliar with the "Fun" delegate, it's just one of the many standard delegates I use for function signatures in my FP# library. Here is the List monad in C#:
public class ListMonad<T> : Monad<T>
{
protected List.t<T> l;

public ListMonad(T t)
{
l = t;
}

public ListMonad()
{
l = List.Nil<T>();
}

public ListMonad<B> Bind<B>(Fun<T, ListMonad<B>> f)
{
return new ListMonad<B>(
List.MapFlat<T, B>(
l, delegate(T t) { return f(t).l; }));
}
}

Note that all of these monads in C# share a common structure. They have at least one constructor used to wrap a value (called 'return' earlier), and they all have a Bind method which operates on the value encapsulated in the monad, and maps it to a new instance of the monad. Since we have such a similar structure, is there a way to declare an interface or an abstract base class declaring the signature for the Bind method?

Unfortunately not, because C# cannot abstract over type constructors. If it could, the abstract monad class and the Identity monad would look something like:
public abstract class Monad<M,T> where M : Monad
{
public abstract M<M, R> Bind<R>(Fun<T, M<M, R>> f);
}
public sealed class Identity<T> : Monad<Identity,T>
{
...
public override Identity<R> Bind<R>(Fun<T, Identity<R>> f)
...
}

Note the two emphasized sections: the class constraint on Monad, and the type parameter Identity provides to Monad when inheriting from it. They are both used without a type argument. This is illegal in C#/.NET, but it's perfectly legal in languages with more powerful type systems, such as those with "kinds". Identity<int> has kind *, while Identity without a type argument has type *⇒*, ie. a function that constructs a type of kind * when given a type of kind *. This is why monads are not so easily translated into languages like C#.

Coming soon, a real example of using monads in C#?

Wednesday, January 16, 2008

Towards the best collection API... in C#. And some partial applications too.

The venerable Oleg Kiselyov once posted about the "best" collection traversal API. Let's call this ideal iterator a "SuperFold". LTU also covered his article. Essentially, a SuperFold is a left fold with early termination support. Any cursor can then be automatically derived from the SuperFold. The converse is not true.

Additional arguments are made in the above paper and in the LTU thread, so I won't repeat them here. Without further ado, I present the SuperFold for my purely functional list in C#:
//OCaml signature: ((T → B → bool * B) → B → B) → (T → B → bool * B) → B → B
B SuperFold<B>(
Fun<Fun<T, B, Pair<bool, B>>, B, B> self,
Fun<T, B, Pair<bool, B>> proc,
B seed)
{
bool cont;
proc(head, seed).Bind(out cont, out seed);
return cont ? self(proc, seed) : seed;
}

While quite simple, it's not as efficient as it should be since C#/.NET doesn't support proper tail calls. You can see in that source file that I derived FoldLeftDerived from SuperFold. Deriving FoldRight is a bit trickier, so I have to think about it. The simple, inefficient, answer is to simply reverse the list.

I've also enhanced the FP# library with:
  1. A number of tuple types (up to 10),
  2. Tuple inference,just call: Tuple._(a,b,c,...)
  3. streamed versions of Map/Filter/FoldRight over IEnumerable which don't build intermediate lists,
  4. FoldLeft over IEnumerable including support for early termination,
  5. FoldLeft over my purely functional list including support for early termination,
  6. Option type now looks more like System.Nullable, with an overload for the | operator to choose between an empty option, or a default value [1],
  7. Partial application for almost all defined function types
I'll probably remove SuperFold from my purely functional list, since I don't think it will end up being useful in C#. The iterated FoldLeft I defined is more efficient due to the lack of tail recursion, and the iterated version also supports early termination.

[1] It's very frustrating to see how close MS gets to a truly general and useful abstraction, only to lock it down for no apparent reason. What good reason is there for Nullable to be restricted to struct types? If you give it more than a second's thought, I think you'll realize that there is no good reason. Nullable is the option type if it weren't for this restriction!

Tuesday, January 8, 2008

ML Modules in C# - Sorely Missing Polymorphic Type Constructors

As Chung-chieh Shan pointed out, my encoding of modules in C# is somewhat limited. In particular, I cannot abstract over type constructors, which is to say, C# is missing generics over generics. Consider the Orc.NET interpreter:
class Orc {
class Exp<T> { ... }

public Exp<U> Seq<T,U>(Exp<T> e1, <U>) { ... }
public Exp<T> Par<T>(Exp<T> e1, Exp<T>) { ... }
public Exp<T> Where<T>(Exp<T> e1, Exp<Promise<T>>) { ... }
}

This is the result of my translation, which was necessitated by the "Where" method. Where introduces a dependency which currently cannot be expressed with ordinary C# constraints, so the module encoding is necessary.

The above interface is a direct, faithful implementation of the Orc semantics. The implementation I currently have is an interpreter for those semantics. What if I want to provide a compiler instead? The interface should remain the same, but the implementation should differ. This is a classic abstraction problem.

It's clear that the implementations can be changed at compile-time, but providing a different Orc implementation at runtime, something very natural for OO languages, does not seem possible. The reason is that C# lacks the requisite polymorphism.

A generic type is a type constructor, which means that from an abstract type definition, Exp<T>, you can construct a concrete type by providing a concrete type T, such as Exp<int>. But, if Exp<T> is not a concrete type until you provide it a concrete type T, what is the type of the abstract definition Exp<T>?

Reasoning about type constructors requires the introduction of something called "kinds". As types classify values, so kinds classify types. The set of values, or concrete types, is of kind *. The set of type constructors is of kind *⇒*, which is to say they are "compile-time functions" accepting a concrete type and producing a concrete type.

Now consider that we have multiple Exp<T> implementations, say Interpreter.Orc.Exp and Compiler.Orc.Exp, all with different innards, and all define the same operations and thus are theoretically interchangeable. We would somehow like to abstract over the different implementations of Exp<T> so that we can use whichever one is most appropriate. In other words, we want to make our code polymorphic over a set of generic types Exp<T>.

This necessitates type constructor constructors, the kind *⇒*⇒*, which accepts a type such as Orc.Compiler.Exp, and produces a type constructor Exp<T>.

At the moment, I can't think of a way to encode such polymorphism in C#. Functional languages provide this level of polymorphism, and some even provide higher-order kinds, meaning type constructors can be arguments to other type constructors, thus achieving entirely new levels of expressiveness.