Tuesday, February 21, 2012

Reusable Ad-Hoc Extensions for .NET

I posted awhile ago about a pattern for ad-hoc extensions in .NET using generics. Unfortunately, like every "design pattern", you had to manually ensure that your abstraction properly implements the pattern. There was no way to have the compiler enforce it, like conforming to an interface.

It's common wisdom that "design patterns" are simply a crutch for languages with insufficient abstractive power. Fortunately, .NET's multicast delegates provides the abstractive power we need to eliminate the design pattern for ad-hoc extensions:

/// <summary>
/// Dispatch cases to handlers.
/// </summary>
/// <typeparam name="T">The type of the handler.</typeparam>
public static class Pattern<T>
{
    static Dispatcher<T> dispatch;
    static Action<T, object> any;

    delegate void Dispatcher<T>(T func, object value,
                                  Type type, ref bool found);

    /// <summary>
    /// Register a case handler.
    /// </summary>
    /// <typeparam name="T0">The argument type.</typeparam>
    /// <param name="match">Expression dispatching to handler.</param>
    public static void Case<T0>(Expression<Action<T, T0>> match)
    {
        var call = match.Body as MethodCallExpression;
        var handler = Delegate.CreateDelegate(typeof(Action<T, T0>),
                                              null, call.Method)
                   as Action<T, T0>;
        dispatch += (T x, object o, Type type, ref bool found) =>
        {
            // if type matches exactly, then dispatch to handler
            if (typeof(T0) == type)
            {
                found = true;   
                handler(x, (T0)o);
            }
        };
    }
    /// <summary>
    /// Catch-all case.
    /// </summary>
    /// <param name="match">Expression dispatching to handler.</param>
    public static void Any(Expression<Action<T, object>> match)
    {
        var call = match.Body as MethodCallExpression;
        var handler = Delegate.CreateDelegate(typeof(Action<T,object>),
                                              null, call.Method)
                   as Action<T, object>;
        any += handler;
    }

    /// <summary>
    /// Dispatch to a handler for <typeparamref name="T0"/>.
    /// </summary>
    /// <typeparam name="T0">The value type.</typeparam>
    /// <param name="value">The value to dispatch.</param>
    /// <param name="func">The dispatcher.</param>
    public static void Match<T0>(T0 value, T func)
    {
        bool found = false;
        dispatch(func, value, value.GetType(), ref found);
        if (!found)
        {
            if (any == null) throw new KeyNotFoundException(
                                       "Unknown type.");
            else any(func, value);
        }
    }
}

The abstraction would be used like this:

interface IFoo
{
    void Bar(int i);
    void Foo(char c);
    void Any(object o);
}
class xFoo : IFoo
{
    public void Bar(int i)
    {
        Console.WriteLine("Int: {0}", i);
    }
    public void Foo(char c)
    {
        Console.WriteLine("Char: {0}", c);
    }
    public void Any(object o)
    {
        Console.WriteLine("Any: {0}", o);
    }
}
static void Main(string[] args)
{
    Pattern<IFoo>.Case<int>((x, i) => x.Bar(i));
    Pattern<IFoo>.Case<char>((x, i) => x.Foo(i));

    Pattern<IFoo>.Match(9, new xFoo());
    Pattern<IFoo>.Match('v', new xFoo());
    try
    {
        Pattern<IFoo>.Match(3.4, new xFoo());
    }
    catch (KeyNotFoundException)
    {
        Console.WriteLine("Not found.");
    }
    Pattern<IFoo>.Any((x, o) => x.Any(o));
    Pattern<IFoo>.Match(3.4, new xFoo());
    // prints:
    // Int: 9
    // Char: v
    // Not found.
    // Any: 3.4
}

Unlike the previous pattern for ad-hoc extensions, dispatching is always precise in that it dispatches to the handler for the value's dynamic type. The previous solution dispatched only on the static type. This can also be a downside, but you could easily extend the Match method to test on subtypes as well.

The other downside of this solution is that it's not quite as fast since all the type tests are run on each dispatch, where the previous solution cached the specific delegate in a static generic field. This caching can be added to the above class as well. Then, you can have the best of both worlds if you happen to know that the static type is the same as the dynamic type.

Saturday, February 4, 2012

Why Sealed Classes Should Be Allowed In Type Constraints

One of my older posts on Stackoverflow listed some of what I consider to be flaws of C# and/or the .NET runtime. A recent reply to my post posed a good question about one of those flaws, which was that sealed classes should be allowed as type constraints. That seems like a sensible restriction for C# at first, but there are legitimate programs that it disallows.

I figured others would have run into this problem at some point, but a quick Google search didn't turn up much, so I will document the actual problem with this rule. Consider the following interface:

interface IFoo<T>
{
    void Bar<U>(U bar) where U : T;
}

The important part to notice here is the type constraint on the method, U : T. This means whatever T we specify for IFoo<T>, we should be able to list as a type constraint on the method Bar. Of course, if T is a sealed class, we cannot do this:

class Foo : IFoo<string>
{
    public void Bar<U>(U bar)
      where U : string //ERROR: string is sealed!
    {
    }
}

In this case, there's a workaround by allowing the compiler to infer the constraint by making the method private and visible only when coerced as an interface:

class Foo : IFoo<string>
{
    void IFoo<string>.Bar<U>(U bar)
    {
    }
}

But this means that you cannot call Bar on a Foo, you need to first cast it to an IFoo<string>, a completely unnecessary step.

In principle, every type constraint that the compiler can infer implicitly, we should be able to specify explicitly. This is clearly not the case here, and there is no reason for it. It's a purely an aesthetic restriction, not a correctness restriction, that the C# compiler devs took extra effort to implement.

And that is why we should allow sealed classes as type constraints.

Thursday, February 2, 2012

Diff for IEnumerable<T>

I've just added a simple diff algorithm under Sasa.Linq. The signature is as follows:

/// <summary>
/// Compute the set of differences between two sequences.
/// </summary>
/// <typeparam name="T">The type of sequence items.</typeparam>
/// <param name="original">The original sequence.</param>
/// <param name="updated">The updated sequence to compare to.</param>
/// <returns>
/// The smallest sequence of changes to transform
/// <paramref name="original"/> into <paramref name="updated"/>.
/// </returns>
public static IEnumerable<Change<T>> Difference<T>(
    this IEnumerable<T> original,
    IEnumerable<T> updated);
/// <summary>
/// Compute the set of differences between two sequences.
/// </summary>
/// <typeparam name="T">The type of sequence items.</typeparam>
/// <param name="original">The original sequence.</param>
/// <param name="updated">The updated sequence to compare to.</param>
/// <param name="eq">The equality comparer to use.</param>
/// <returns>The smallest sequence of changes to transform
/// <paramref name="original"/> into <paramref name="updated"/>.
/// </returns>
public static IEnumerable<Change<T>> Difference<T>(
    this IEnumerable<T> original,
    IEnumerable<T> updated,
    IEqualityComparer<T> eq);

The extension methods depend only on the following enum and struct:

/// <summary>
/// Describes the type of change that was made.
/// </summary>
public enum ChangeType
{
    /// <summary>
    /// An item was added at the given position.
    /// </summary>
    Add,
    /// <summary>
    /// An item was removed at the given position.
    /// </summary>
    Remove,
}
/// <summary>
/// Describes a change to a collection.
/// </summary>
/// <typeparam name="T">The collection item type.</typeparam>
public struct Change<T>
{
    /// <summary>
    /// The change made at the given position. 
    /// </summary> 
    public ChangeType ChangeType { get; internal set; } 
    /// <summary> 
    /// The set of values added or removed from the given position. 
    /// </summary> 
    public IEnumerable<T> Values { get; internal set; } 
    /// <summary> 
    /// The position in the sequence where the change took place. 
    /// </summary> 
    public int Position { get; internal set; }
} 

This is a simple and general interface with which you can perform all sorts of computations on the differences between two sequences. The code as provided will work out of the box for any type T that implements equality. Some simple examples:

Console.WriteLine( "miller".Difference("myers").Format("\r\n") );
// prints out: 
// +1:y 
// -1:i,l,l 
// +6:s 

var original = new int[] { 2, 5, 99 }; 
var updated = new int[] { 2, 4, 4, 8 }; 
Console.WriteLine( original.Difference(updated).Format("\r\n") ); 
// prints out: 
// +1:4,4,8 
// -1:5,99 

"Format" is a simple extension method also under Sasa.Linq with generates a formatted string when given an IEnumerable.

At the moment, I simply implemented the naive algorithm that takes N*M space and time. I plan to eventually implement some linear space optimizations, as described in An O(ND) Difference Algorithm and Its Variations.

There are many applications for a general difference algorithm like this. Consider a reactive property of type IEnumerable, like as used in a drop down for a user interface. If the UI is remote, as you find in X11 or a web browser, sending the entire list over and over again is bandwidth-intensive, and trashes the latency of the UI. It's much more efficient to just send the changes, which can be accomplished by taking the diff of the original and the new list.