Skip to main content

Embedded Stack Language for .NET

Parametric polymorphism is a powerful tool for constructing and composing safe programs. To demonstrate this, I've constructed a tiny embedded stack language in C#, and I exploited C# generics, aka parametric polymorphism, to ensure that any embedded programs are type-safe, and thus, "can't go wrong".
Moreover, this language is jitted, since I use ILGenerator of System.Reflection.Emit, and the complexity of doing this was no higher than creating an interpreter. This is mainly because the CLR is itself a stack-based VM, and so jitting a stack-based program to a stack-based IL is fairly natural.

Here is the type-safe representation of the stack:
public struct Stack<R, T> : IDisposable
{
internal ILGenerator gen;
public Stack(ILGenerator gen)
{
this.gen = gen;
}
public void Dispose()
{
this.Return();
}
}

This type encapsulates the IL output stream and uses two phantom type variables to encode the values at the top of the execution stack. The T parameter represents the top of the stack, and the R parameter represents the rest of the stack.

A few more types are needed to safely represent other reflected objects used during code generation:
// Represents the 'void' type used in some functions
public struct Unit
{
}
// Represents a named function with a known type structure T→R.
// This could be a void→void, an int→string, etc.
public struct Function<T, R>
{
internal MethodInfo method;
public Function(Action<T> f)
{
this.method = f.Method;
}
}

Here's a test program demonstrating the use:

class Program
{
static void Main(string[] args)
{
var d = new DynamicMethod("test", typeof(void), null);
using (var s = new Stack<Unit, Unit>(d.GetILGenerator()))
{ // equivalent to: () => Console.WriteLine(1 + 2);
s.Int(1)
.Int(2)
.Add()
.Apply(new Function<int, Unit>(Console.WriteLine));
}
d.Invoke(null, null);
Console.ReadLine();
}
}

Next I define a set of extension methods operating on typed stack objects, all parameterized by the underlying types. You can see how the top elements of the stack are specified, and how each operation consumes these elements or otherwise modifies the stack:
public static class CodeGen
{
// Apply a function to its arguments; note that function args and arity
// is checked at compile-time
public static Stack<R, R0>
Apply<R, T, R0>(this Stack<R, T> stack, Function<T, R0> target)
{
stack.gen.EmitCall(OpCodes.Call, target.method, null);
return new Stack<R, R0>(stack.gen);
}
// Embed an literal integer in the IL stream
public static Stack<Stack<R, T>, int>
Int<R, T>(this Stack<R, T> stack, int i)
{
stack.gen.Emit(OpCodes.Ldc_I4, i);
return new Stack<Stack<R, T>, int>(stack.gen);
}
// Add the two integers at the top of the execution stack
public static Stack<R, T>
Add<R, T>(this Stack<Stack<R, T>, T> stack)
{
stack.gen.Emit(OpCodes.Add);
return new Stack<R, T>(stack.gen);
}
// Return from the current function
public static Stack<R, T>
Return<R, T>(this Stack<R, T> stack)
{
stack.gen.Emit(OpCodes.Ret);
return new Stack<R, T>(stack.gen);
}
}

This is obviously a fairly limited set of functionality, and it would require a great deal more work and thought to properly abstract the full execution of CIL (especially exceptions!), but it just might be possible. A large subset is certainly possible.

It might serve as an interesting alternative to System.Linq.Expressions for some code generation applications, since Linq expressions are largely untyped. Here are a few more opcode examples for reference purposes:

// Duplicate the top element
public static Stack<Stack<R, T>, T>
Dup<R, T>(this Stack<R, T> stack)
{
stack.gen.Emit(OpCodes.Dup);
return new Stack<Stack<R, T>, T>(stack.gen);
} // index into an array
public static Stack<R, T>
LoadArrayIndex<R, T>(this Stack<R, Stack<T[], int>> stack)
{
stack.gen.Emit(OpCodes.Ldelem, typeof(T));
return new Stack<R, T>(stack.gen);
}
// runtime type test
public static Stack<R, T>
IsOfType<R, T, T>(this Stack<R, T> stack)
where T : class
{
stack.gen.Emit(OpCodes.Isinst, typeof(T));
return new Stack<R, T>(stack.gen);
}

Pasting all of the code in this post into a file and compiling it should just work.

Wrapping all CIL opcodes this way could be quite useful, since you're guaranteed to have a working program if your code type checks. If you have a language, embedded or otherwise, using this sort of technique might be a good way to achieve a type-safe JIT for .NET, assuming you can assign a stack-based execution semantics. Since you're compiling to .NET, it will need exactly such a stack semantics anyway!

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important. Why Reverse Mode Automatic Differentiation? As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary! Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type: R → R N That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario: R N → R That is, functions of many parameters t...