Saturday, November 12, 2011

Type Unification Forbidden - More C#/CLR Irritations

I've written about quite a few irritations of C#/CLR, including asymmetries in CIL, oddly unverifiable CIL instructions, certain type constraints are forbidden for no reason, delegate creation bugs, the lack of higher-kinded types, equality asymmetries between events and IObservable, generics/type parameter problems, lack of usable control over object layouts, and just overall limitations of the CLR VM.

I've just smack into yet another annoying problem: type parameter unification is forbidden. The Microsoft Connect bug filed in 2004 was closed as By Design.

Here's a simple code fragment demonstrating the problem:

class ObserveTwo<T0, T1> : IObservable<T0>, IObservable<T1>
{
}
This will fail with the error:
'ObserveTwo<T0,T1>' cannot implement both 'System.IObservable<T0>' and 'System.IObservable<T1>' because they may unify for some type parameter substitutions
This is frankly nonsense. What's the problem if T0 and T1 unify? If T0=T1, ObserveTwo just implements IObservable<T0>, and the methods are all unified as well [2]. If T0!=T1, then the usual semantics apply.

The MS Connect bug implies that a type safety issue, but there's no problem that I can see [1].

This is incredibly frustrating, because interfaces are the only way to design certain extension methods (mixin-style). For instance, suppose we have a set of simple interfaces:

// marks a class as containing a parseable parameter of type T
public interface IParseable<T>
{
   HttpRequest Request { get; }
}
// continuation parameters are parseable
public interface IContinuation<T> : IParseable<T> { }
public interface IContinuation<T0, T1> : IParseable<T0>, IParseable<T1> { }
...
This is a perfectly sensible set of type definitions, and there is no ambiguity or type safety problem here, even if a class were to implement IContinuation<int, int>.

If anyone has any suggestions for workarounds, I'm all ears. Options that don't work:

  1. Can't turn IParseable into a struct/class with an implicit conversion, because implicit conversions don't work on interfaces.
  2. Can't define MxN overloads of the extension methods defined on IParseable, where N=number of IContinuation type definitions, and M=number of type parameters (although this actually has a shot of working, the code duplication is just ludicrous).

A Partial Solution

The best solution I have involves N overloads of the extension methods by implementing interfaces that designate type parameter positions:

// type param at index 0
public interface IParam0<T> { }
// type param at index 1
public interface IParam1<T> { }
// type param at index 2
public interface IParam2<T> { }
...
// continuations  have indexed type parameters
public interface IContinuation<T> : IParam0<T> { }
public interface IContinuation<T0, T1> : IParam0<T0>, IParam1<T1> { }
...
Now instead of defining extension methods on IParseable, I define an overload per IParamX interface (N overloads, 1 per type parameter).

This necessarily causes an ambiguity when two type parameters unify, since the compiler won't know whether you wanted to call the extension on IParam0<int> or IParam2<int>, but this ambiguity happens at the call site/at type instantiation, instead of type definition. This is a much better default [1], because you can actually do something about it by disambiguating manually. I've eliminated ambiguity entirely in my library by appending the type parameter index to the extension method name.

Of course, none of this boilerplate would have been necessary if type unification were supported.

[1] If the CLR supported type inequalities, instead of just type equalities, then we could just forbid this at the point ObserveTwo is created.

[2] EDIT 2011-11-13T11:45 AM: there's been some confusion about this statement. Doug McClean rightly points out that there's no guarantee that the two implementations are confluent, so unifying is not an "automatic" thing, so please don't take this to mean that I'm saying the compiler should be able to automatically "merge" methods somehow, or magically know which implementation to dispatch to based on context. This is undecidable in general.

Also, even if the interfaces require no implementation, the error still occurs, despite reduction being Church-Rosser. Finally, plenty of nice properties are violated on the CLR, so I'm not sure why also sacrificing confluence is such a big deal here. A sensible dynamic semantics, like always dispatching to the implementation for T0 in case of type parameter unification, restores most of the desirable properties without sacrificing expressiveness, and without going whole-hog and ensuring confluence via type inequalities [1].

Tuesday, November 1, 2011

Ad-hoc Extensions in .NET

A recent discussion on LtU brought up a common limitation of modern languages and runtimes. Consider some set of abstractions A, B, ..., Z that we wish supported some custom operation Foo(). Languages like C# give only two straightforward possibilities: inheritance and runtime type tests and casts.

Inheritance is straightforward: we simply inherit from each A, B, ..., Z to make A_Foo, B_Foo, ..., Z_Foo, with a custom Foo() method.

Unfortunately, inheritance is sometimes forbidden in C#, and furthermore, it sometimes prevents us from integrating with existing code. For instance, say we want to integrate with callback-style code, where the existing code hands our program an object of type A. We can't call Foo() on this A, because it's not an A_Foo, which is what we really wanted.

This leaves us with the undesirable option of using a large set of runtime type tests. Now type tests and casts are actually pretty fast, faster than virtual dispatch in fact, but the danger is in missing a case and thus triggering a runtime exception, a danger inheritance does not have.

Furthermore, we'd have to repeat the large set of runtime type tests for each operation we want to add to A, B, ... Z, which means every time we want to add an operation we have to do a large copy-paste, and every time we add a new class we have to update every place we test types, and the compiler doesn't warn us when we're missing a case.

A Solution

Fortunately, there's a straightforward way to address at least part of this problem. We can collect all the type tests we have to do in one place, and we can ensure that we have properly handled all cases that we are aware of. To do this, we combine the standard visitor pattern with the CLR's open instance delegates and its unique design for generics.

Here are the types:

// the interface encapsulating the operation to implement, ie. Foo()
public interface IOperation
{
  void A(A obj);
  void B(B obj);
  ...
  void Z(Z obj);
  void Unknown(object obj);
}
// the static dispatcher where runtime tests occur
public static class Dispatcher<T>
{
  static Action<IOperation, T> cached;
  public static Action<IOperation, T> Dispatch
  {
    get { return cached ?? (cached = Load()); }
  }
  static Action<IOperation, T> Load()
  {
    var type = typeof(T);
    var mName = type == typeof(A) ? "A":
                type == typeof(B) ? "B":
                        ...
                                  ? "Unknown";
    var method = typeof(IOperation).GetMethod(method);
    return Delegate.CreateDelegate(typeof(Action<IOperation, T>), null, method)
             as Action<IOperation, T>;
  }
}

If you want to extend A, B, ... Z, with any operation, you simply create a class that implements IOperation and the compiler will ensure you handle all known cases. If you want to add a class to the set of handled cases, you simply extend IOperation and add a case to the runtime type tests (or if you use a naming convention, you can just use the class name as the method name).

The dynamic type test runs once, and then a delegate that directly calls into the IOperation is cached, so the type tests are not run again.

If you try to dispatch on an unknown type, IOperation.Unknown is invoked. I could have made this a generic method, but the CLR currently has a bug creating open instance delegates to generic interface methods, so that would require some code generation to do properly.

There is a caveat though: if any of the cases are subtypes of each other, and you dispatch on the static base type, it will dispatch to the base type and not the super type handler. For instance, if A:B, and you call Dispatcher<B>.Dispatch(new A()), IOperation.B is called, not IOperation.A. This can be handled in various ways, but it's beyond the scope of this article. I may post a follow-up discussing the various options.

Extensions

Type Constraints on T

If all types to handle are subtypes of type X, then it's simple to constrain the type parameter T on Dispatcher<T>:

 public static class Dispatcher<T>
  where T : X
{
  ...
}

Operation Return Type

You could extend IOperation with a return type, but this requires propagating the return type parameter into Dispatcher and the cached delegate type, and thus requires more runtime state for more static fields. This is because the CLR doesn't support GADTs, so following this paper, I prefer to keep the return type encapsulated in IOperation and expose it as a public field for the caller to extract the value:

public sealed class Foo : IOperation
{
  // the return value
  public Bar ReturnValue { get; set; }
  public void A(A obj) { ... }
  ...
}

This requires fewer type parameters, and keeps the implementation simple.

Per-Class Overrides

Suppose some of the classes you are handling are your own, or are already aware of your code such that they have already implemented your custom operations. In that case, you can specify a companion interface to indicate this, and modify the dispatch code to invoke the interface instead:

public interface ICompanion<T>
{
  // your custom operations that are implemented via subclassing
  void Foo(IOperation operation, T self);
  ...
}
The dispatch code is then modified like so:
static Action<IOperation, T> Load()
{
  var type = typeof(T);
  // if companion interface is implemented, then dispatch directly into it
  if (typeof(ICompanion<T>).IsAssignableFrom(type))
  {
    var method = type.GetMethod("Foo");
    return Delegate.CreateDelegate(typeof(Action<IOperation, T>), null, method)
             as Action<IOperation, T>;
  }
  // the usual dispatch code
  ...

Useful Applications

This pattern simulates the usefulness of type classes in Haskell, because these are essentially ad-hoc extensions added after the fact. I use this exact pattern in Sasa.Dynamics to implement safe type case and reflection patterns over the CLR primitive types, ie. ints, strings, delegates, etc., with the per-class override extension described above via the IReflected interface.