Saturday, April 7, 2007

General Pattern Matching with Visitors

My previous definition of IVisitor hard-coded the return value as a type of Val, but one can generalize a visitor to any kind pattern-matching function by adding a generic type parameter to the interface:
//the pattern matching visitor
public interface IVisitor<T>
{
//return type is now parameterized
T App(Exp e0, Exp e1);
}

This of course requires a modification to the class variants as well:
//the type representing an expression application
public sealed class App : Exp
{
Exp e0;
Exp e1;

public App(Exp e0, Exp e1)
{
this.e0 = e0;
this.e1 = e1;
}

//add a generic constraint to the Visit method, so the client can
//specify the return type

public override T Visit<T>(IVisitor<T> v)
{
return v.App(e0, e1);
}
}

So instead of IVisitor being hard-coded as an evaluator, I can now write an IVisitor that performs transformations on the expression tree (such as rewriting optimizations), or a visitor that implements some arbitrary predicate over the expression tree (such as type-checking, type inference, etc.).

Like pattern matching functions over ordinary parameterized algebraic data types, the return value of the visitor must be T or a subtype of T.

In fact, this visitor pattern, minus the decomposition as in pattern matching, is featured in Generalized Algebraic Data Types and Object-Oriented Programming; see also the LTU discussion.

According to the paper, generics as in C# are almost capable of encoding GADTs. Unfortunately, they lack some flexibility in declaring type dependencies between class type parameters, and method type parameters. The paper provides some very interesting C# examples of type-safe type checking, evaluation, phantom types, and representation types. This generics black magic warrants further study for the advanced C# student.

Tuesday, April 3, 2007

Visitor pattern considered harmful (unless functionalized)

To kick off my first post, I thought I'd cover a post on another blog that I found interesting.

Basically, the gist of the post is that object oriented visitor pattern does not lend itself to the problem of walking trees, and general symbolic manipulation, despite that being its claim to fame. In general, algebraic data types and pattern matching are much simpler, and more concise.

I agree, however I think that post goes a bit too far in saying that visitors simply can't do this elegantly, because in fact there's a straightforward translation of a pattern matching function into a visitor, as long as you keep your objects "functional", ie. treat them as variants.

For instance, let's say we are constructing an evaluator:

//the Expression type
public abstract class Exp
{
public abstract Val Visit(IVisitor v);
}

//the type of Values
public abstract class Val : Exp
{
}

//the type representing an application
public sealed class App : Exp
{
Exp e0;
Exp e1;

public App(Exp e0, Exp e1)
{
this.e0 = e0;
this.e1 = e1;
}

public override Val Visit(IVisitor v)
{
return v.App(e0, e1);
}
}

//the pattern matching visitor
public interface IVisitor
{
Val App(Exp e0, Exp e1);
//...
}


This is slightly different from the typical visitor pattern you see in books, where the object itself would be passed in to IVisitor.App(). In fact, this looks almost exactly like a pattern matching function in a language like OCaml:

type Exp = App of Exp * Exp | Val   (* | ... *)

let rec eval exp =
match exp with
App e0 e1 -> (* ... evaluate the application *);;


Thus, properly designed, a visitor is simply a pattern matching function. By convention, we use the class name as the visitor's method, so it matches with the convention in functional language's (ie. the constructor name). The visitor method's parameters are the object's private fields, just like a variant. I'm using this approach for an interpreter I'm implementing and it works quite well (unfortunately no source is publicly available at the moment).

In terms of efficiency, the pattern matching function is probably faster. After all, it's simply a tag check plus a branch, whereas the visitor must perform a double dispatch (two method lookups incurring two indirections and two indirect branches).

Clearly, the visitor is also far more verbose, as you must create the "variant classes", manually add a "Visit" method to each, and since there is no "default" case as in pattern matching, like the "default" case in a switch statement, you have to exhaustively handle each variant explicitly in each visitor, even if the case isn't relevant. You can mitigate the latter problem by implementing a "Default" abstract base class with dummy implementations on all cases, and then override only the cases you are interested in:

public abstract class Default : IVisitor
{
//overridden in my Interpreter visitor
public virtual Val App(Exp e0, Exp e1)
{
throw new NotSupportedException();
}
}

As a potentially interesting extension, the variant classes may not even have to be in the same inheritance hierarchy, they only have to implement a marker interface with the appropriate Visit() method. This may increase extensibility along similar lines to polymorphic variants; only the visitor has to be kept up to date with all the relevant cases, and it can evolve separately from the core.

Summary:

So if you're doing symbolic manipulation, and are forced to operate in a OO language like C# above, you can still approximate the elegance of algebraic data types and pattern matching, it just takes a little more work on your part. Pretty much par for the course in the OO world. :-)