Friday, November 7, 2008

Embedded Stack Language for .NET

Parametric polymorphism is a powerful tool for constructing and composing safe programs. To demonstrate this, I've constructed a tiny embedded stack language in C#, and I exploited C# generics, aka parametric polymorphism, to ensure that any embedded programs are type-safe, and thus, "can't go wrong".
Moreover, this language is jitted, since I use ILGenerator of System.Reflection.Emit, and the complexity of doing this was no higher than creating an interpreter. This is mainly because the CLR is itself a stack-based VM, and so jitting a stack-based program to a stack-based IL is fairly natural.

Here is the type-safe representation of the stack:
public struct Stack<R, T> : IDisposable
{
internal ILGenerator gen;
public Stack(ILGenerator gen)
{
this.gen = gen;
}
public void Dispose()
{
this.Return();
}
}

This type encapsulates the IL output stream and uses two phantom type variables to encode the values at the top of the execution stack. The T parameter represents the top of the stack, and the R parameter represents the rest of the stack.

A few more types are needed to safely represent other reflected objects used during code generation:
// Represents the 'void' type used in some functions
public struct Unit
{
}
// Represents a named function with a known type structure T→R.
// This could be a void→void, an int→string, etc.
public struct Function<T, R>
{
internal MethodInfo method;
public Function(Action<T> f)
{
this.method = f.Method;
}
}

Here's a test program demonstrating the use:

class Program
{
static void Main(string[] args)
{
var d = new DynamicMethod("test", typeof(void), null);
using (var s = new Stack<Unit, Unit>(d.GetILGenerator()))
{ // equivalent to: () => Console.WriteLine(1 + 2);
s.Int(1)
.Int(2)
.Add()
.Apply(new Function<int, Unit>(Console.WriteLine));
}
d.Invoke(null, null);
Console.ReadLine();
}
}

Next I define a set of extension methods operating on typed stack objects, all parameterized by the underlying types. You can see how the top elements of the stack are specified, and how each operation consumes these elements or otherwise modifies the stack:
public static class CodeGen
{
// Apply a function to its arguments; note that function args and arity
// is checked at compile-time
public static Stack<R, R0>
Apply<R, T, R0>(this Stack<R, T> stack, Function<T, R0> target)
{
stack.gen.EmitCall(OpCodes.Call, target.method, null);
return new Stack<R, R0>(stack.gen);
}
// Embed an literal integer in the IL stream
public static Stack<Stack<R, T>, int>
Int<R, T>(this Stack<R, T> stack, int i)
{
stack.gen.Emit(OpCodes.Ldc_I4, i);
return new Stack<Stack<R, T>, int>(stack.gen);
}
// Add the two integers at the top of the execution stack
public static Stack<R, T>
Add<R, T>(this Stack<Stack<R, T>, T> stack)
{
stack.gen.Emit(OpCodes.Add);
return new Stack<R, T>(stack.gen);
}
// Return from the current function
public static Stack<R, T>
Return<R, T>(this Stack<R, T> stack)
{
stack.gen.Emit(OpCodes.Ret);
return new Stack<R, T>(stack.gen);
}
}

This is obviously a fairly limited set of functionality, and it would require a great deal more work and thought to properly abstract the full execution of CIL (especially exceptions!), but it just might be possible. A large subset is certainly possible.

It might serve as an interesting alternative to System.Linq.Expressions for some code generation applications, since Linq expressions are largely untyped. Here are a few more opcode examples for reference purposes:

// Duplicate the top element
public static Stack<Stack<R, T>, T>
Dup<R, T>(this Stack<R, T> stack)
{
stack.gen.Emit(OpCodes.Dup);
return new Stack<Stack<R, T>, T>(stack.gen);
} // index into an array
public static Stack<R, T>
LoadArrayIndex<R, T>(this Stack<R, Stack<T[], int>> stack)
{
stack.gen.Emit(OpCodes.Ldelem, typeof(T));
return new Stack<R, T>(stack.gen);
}
// runtime type test
public static Stack<R, T>
IsOfType<R, T, T>(this Stack<R, T> stack)
where T : class
{
stack.gen.Emit(OpCodes.Isinst, typeof(T));
return new Stack<R, T>(stack.gen);
}

Pasting all of the code in this post into a file and compiling it should just work.

Wrapping all CIL opcodes this way could be quite useful, since you're guaranteed to have a working program if your code type checks. If you have a language, embedded or otherwise, using this sort of technique might be a good way to achieve a type-safe JIT for .NET, assuming you can assign a stack-based execution semantics. Since you're compiling to .NET, it will need exactly such a stack semantics anyway!

No comments: