Monday, November 18, 2013

First-Class Slots for .NET

Yesterday, I posted about the new extension of IRef<T> to arbitrary reference semantics for the CLR, including referencing inner fields, properties, and array slots. First-class references make it simple to operate on specific mutable data without caring about the underlying type of that data.

I just pushed another abstraction that handles a related, but different case: first-class slots.

ISlot<T>

An object "slot" is a value that designates a mutable location of a specific class of values, not a mutable location of a specific instance like first-class references. Where first-class references hide the underlying object type, slots expose the object type and allow you to mutate the slots of multiple objects at once, as long as they are subtypes of the slot's object type. Here's the declaration of ISlot<T>:

public interface ISlot<TObj, T>
{
    T Get(TObj obj);
    T Set(TObj obj, T value);
}

As you can see, the object we're manipulating is passed in while operating on it, so an ISlot is actually a direct descriptor that can access members of any instance of a subtype of TObj. If you're wondering when you would ever need this, rest assured that there are applications that need to write algorithms of this type. For instance, consider an object-relational mapper (ORM), which uses reflection to extract the members that need to be get/set on object flush/load from a database. Essentially, the ORM is reflecting over all slots of an object being flushed or loaded from a database, but it does so in a manner that isn't very reusable, and the object hydration code is coupled tightly to the database access code as a result.

Reifying slots as a distinct, first-class abstraction makes them independently testable, and the reflection code and database access code is now very loosely coupled. An ORM is but one example of an application that makes use of generic object slots.

Similar to first-class references, the Slot class exposes some static constructors for creating slots:

public abstract class Slot
{
  public static Member<TObj, T> Create<TObj, T>(Func<TObj, T> get, Action<TObj, T> set);
  public static Array<T> Create<T>(int index);
  public static Member<TObj, T> Create<TObj, T>(MemberInfo member)
  public static Member<TObj, T> Create<TObj, T>(Expression<Func<TObj, T>> member)

  public sealed class Array<T> : Slot, ISlot<T[], T> { ... }
  public sealed class Member<TObj, T> : Slot, ISlot<TObj, T> { ... }
}

These operations are exactly what you saw in the last article, where you can create slots from object members and array indices.

It might seem at first that slots are strict generalizations of first-class references, but this is deceptive. It's true that any algorithm you'd could write using references could be rewritten to use slots, but the number of type parameters and value parameters could increase non-linearly to the point where it's unwieldy and could easily obfuscate the underlying algorithm.

Limitations

There is one limitation at the moment. Specifically, value types need to be passed by reference during update in order for assignment to be visible to callers of ISlot.Set, but this isn't currently possible given the interface. As a result, there's currently a type constraint on TObj to restrict it to reference types.

A simple solution would simply be to return TObj from ISlot.Set, so the calling context can simply overwrite its own local value with the one modified by the slot. Another possibility is to make the TObj parameter to ISlot.Set a by-ref parameter. I'm considering these and a few other options, and Sasa's v0.12.0 release will probably contain the final solution.

Friday, November 15, 2013

First-Class References for .NET

References in C# are second-class citizens, which is to say, that you cannot pass them around as values. They can appear in function parameter position, but that's it:

public static void DoFoo(ref int value)
{
    Action captureRef = () => value = 3; // ERROR: ref can't escape scope
}

This makes it easy to verify their safety statically, and makes them very efficient, but it somewhat limits their expressiveness.

First-class citizens can be passed around as values, and otherwise used in any way that you'd use any other object. The above capture would work, for instance. To make references first-class citizens means constructing an object that exposes get and set operations, and that can reference the internals of any .NET type.

IRef<T>

Sasa has had the IRef<T> type for quite some time, but its full potential as a first-class reference hasn't been realized. There was only a single implementation, and that was a simple mutable slot as found in ML. I've just pushed a series of changes that effectively expands the usefulness of IRef.

You can now create a reference to an object property, a field, and a reference into an array:

public class Ref
{
    /// <summary>
    /// First-class reference indexing into an array.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    public class Array<T> : Ref, IRef<T>
    ...

    /// <summary>
    /// First-class reference into an object property.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    public class Member<T> : Ref, IRef<T>
    ...
}

The static Ref class contains a series of static, overloaded Create methods to construct instances of the appropriate IRef type. The signatures look like this:

public class Ref
{
    public static Ref<T> Create<T>(T value);
    public static Ref<T> Create<T>();
    public static Array<T> Create<T>(T[] array, int index);
    public static Member<T> Create<T>(Func<T> get, Action<T> set);
    public static IRef<T> Create<TObject, T>(TObject obj, MemberInfo member);
    public static IRef<T> Create<TObject, T>(TObject obj, FieldInfo field);
    public static Member<T> Create<T>(object obj, PropertyInfo property);
    public static IRef<T> Create<TObject, T>(TObject obj, Expression<Func<TObject, T>> member);
}

The first two overloads were the pre-existing constructors for slots as found in MLs, like OCaml. The third overload creates a reference that indexes into the given array. The fourth overload taking two delegates creates a ref to object members, like properties and fields. For properties these delegates are precisely the get/set methods of the property, so there's no additional overhead to using them.

The overloads taking the reflection members should be obvious as creating references to those members. The last overload is simply a convenient means of specifying the member you wish to access in a type-safe manner using LINQ expressions. For instance, here's an example from the test suite:

class Foo
{
    public int Bar { get; set; }
}

[Fact]
static void TestClassRef()
{
    var f = new Foo { Bar = 3 };
    var fref = Ref.Create(f, x => x.Bar);
    Assert.Equal(f.Bar, fref.Value);
    fref.Value = 99;
    Assert.Equal(99, f.Bar);
    Assert.Equal(f.Bar, fref.Value);
}

The reference variable fref is constructed via a simple LINQ expression at compile-time, instead of via error-prone runtime reflection. If Foo.Bar were a field instead of a property, the above code would be unchanged. The above code is for instance members. A static member looks a little different because the reference to provide is null:

class FieldObj
{
    public static int Foo = 3;
}

[Fact]
static void TestStaticFieldRef()
{
    var fref = Ref.Create(default(FieldObj), x => FieldObj.Foo);
    Assert.Equal(3, fref.Value);
    fref.Value = 99;
    Assert.Equal(99, fref.Value);
}

I use default(FieldObj) so type inference resolves the type parameters to Ref.Create, but you can use null and specify the type parameter manually if you like.

First-class references enable you to decouple programs using simple get/set semantics from the underlying representation being modified. The same algorithm exploiting IRef can transparently manipulate arrays, instance fields and properties, or static fields and properties.

These new ref types will be in the upcoming Sasa v0.12.0 release, or you can grab the source from Sourceforge and play with them now!

Sunday, November 10, 2013

Degenerate Implicits in C#

Scala and Haskell both have very useful type-directed concepts for resolving implicit function parameters, called "implicits" and "type classes" respectively. To illustrate the impact of this, consider a generic sorting function like Enumerable.OrderBy. It must define two overloads, one that takes a custom comparer, and one that does not and so should use a default comparison for type TKey:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer
)
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector
)

While it's simple to define the second overload as calling the first with Comparer<TKey>.Default for the custom comparer, in general, for N overridable parameters, you would need to define something like factorial(N) overloads for all possible permutations of defaults and overridable parameters. This can quickly become unwieldy. Ideally, we could define only a single function with optional parameters and sane default values.

C# 4.0 introduced a much more limited form of implicits called "default values" to address part of this problem. These implicit values are limited to only primitive types like Int32 and String, so they aren't useful for general resolution of implicit values, but they were a good start.

Fortunately, C# has a few other features that, when combined with default values, allow us to add something like the more flexible implicit values as found in Scala: reified generics and value types. Basically, we define the implicit parameter to be some sort of value type, whose default value is always defined, and this value type resolves to either the default instance, or to a custom instance if one is provided. For instance, we can define a single OrderBy overload like so:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    CompareImplicit<IComparer<TKey>> comparer =
      default(CompareImplicit<IComparer<TKey>>)
)

Where CompareImplicit can be defined something like this:

public struct CompareImplicit<T>
{
    IComparer<T> cmp;
    static IComparer<T> def = Comparer<T>.Default;
    public CompareImplicit(IComparer<T> cmp) { this.cmp = cmp; }
    public IComparer<T> Resolve()
    {
        return cmp ?? def;
    }
    public int Compare(T left, T right)
    {
        return Resolve().Compare(left, right);
    }
}

You can then write any sort of instance resolution for the IComparer<T> value you like, and still allow callers to specify optional overrides. For instance, here's an example of using the built-in comparison and overriding it with a "reverse" comparer:

public static class Example
{
    sealed class Reverse : IComparer<int>
    {
        public int Compare(int left, int right)
        {
            return right.CompareTo(left);
        }
    }
    public static IEnumerable<T> OrderBy<T>(
      IEnumerable<T> source,
      CompareImplicit<T> cmp = default(CompareImplicit<T>))
    {
        return source.OrderBy(x => x, cmp.Resolve());
    }
    public static void Main(string[] args)
    {
        var orig = new[] { 1, 66, 4, 32 };
        var inorder = OrderBy(orig);
        var revorder = OrderBy(orig, new CompareImplicit<int>(new Reverse()));
        Console.WriteLine("in order: {0}", inorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.WriteLine("rev order: {0}", revorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.ReadLine();
        // prints:
        // in order: 1, 4, 32, 66,
        // rev order: 66, 32, 4, 1
    }
}

I reused the Enumerable.OrderBy overload for clarity, but in principle, the OrderBy function would normally just call Resolve() to access the IComparer instance to use locally. We can define similar implicit arguments to use for resolving instances of IEqualityComparer, and so on, but there's perhaps a more uniform, reusable approach using only a single struct for all such implicits:

public struct Implicit<T>
{
    public readonly T Instance;
    public Implicit(T instance)
    {
        this.Instance = instance;
    }
    public T Resolve(T defaultInstance)
    {
        return Instance != null ? Instance : defaultInstance;
    }
    public static implicit operator Implicit<T>(T instance)
    {
        return new Implicit<T>(instance);
    }
}

With a little more work for the library writer, this allows us to use a single struct for all implicit parameters, and the receiving function will simply need to provide the default instance if no override is specified. For instance, here's OrderBy rewritten using this type:

public static class Example
{
    public static IEnumerable<T> OrderBy<T>(
      IEnumerable<T> source,
      Implicit<IComparer<T>> cmp = default(Implicit<IComparer<T>>))
    {
        return source.OrderBy(x => x, cmp.Instance ?? Comparer<T>.Default);
    }
}

In order to avoid hard-coding the source of the default implicit in every function, it's better to use dependency injection. This will will decouple defining the default value for implicits from the uses, thus allowing more flexibility:

public static class Example
{
    public static IEnumerable<T> OrderBy<T>(
      IEnumerable<T> source,
      Implicit<IComparer<T>> cmp = default(Implicit<IComparer<T>>))
    {
        return source.OrderBy(x => x, cmp.Instance ?? IoC.Resolve<IComparer<T>>());
    }
}

Technically, you don't even need the Implicit type in parameter list, as all reference types can be null be default. However, the presence of Implicit is good documentation implying that some sort of implicit instance resolution is taking place, and that this process can be overridden.

Finally, there are still some unfortunate C# warts in this approach. For instance, C# does not allow implicit conversions to or from interface types, so you will always have to explicitly construct the CompareImplicit or Implicit wrapper around a custom instance, even though Implicit defines an implicit conversion. This is annoying, but hardly the only time C# has restricted expressiveness for flimsy aesthetic reasons. I would also define an extension method to simplify lifting a custom value into an Implicit wrapper:

public static class Implicits
{
    public static Implicit<T> AsImplicit<T>(this T instance)
    {
        return instance;
    }
}

The previous code example will now look like this:

public static class Example
{
    ...

    public static void Main(string[] args)
    {
        var orig = new[] { 1, 66, 4, 32 };
        var inorder = OrderBy(orig);
        var revorder = OrderBy(orig, new Reverse().AsImplicit());
        Console.WriteLine("in order: {0}", inorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.WriteLine("rev order: {0}", revorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.ReadLine();
        // prints:
        // in order: 1, 4, 32, 66,
        // rev order: 66, 32, 4, 1
    }
}

In my opinion, not too bad given C#'s inherent limitations.