Skip to main content

Embedded Stack Language for .NET

Parametric polymorphism is a powerful tool for constructing and composing safe programs. To demonstrate this, I've constructed a tiny embedded stack language in C#, and I exploited C# generics, aka parametric polymorphism, to ensure that any embedded programs are type-safe, and thus, "can't go wrong".
Moreover, this language is jitted, since I use ILGenerator of System.Reflection.Emit, and the complexity of doing this was no higher than creating an interpreter. This is mainly because the CLR is itself a stack-based VM, and so jitting a stack-based program to a stack-based IL is fairly natural.

Here is the type-safe representation of the stack:
public struct Stack<R, T> : IDisposable
{
internal ILGenerator gen;
public Stack(ILGenerator gen)
{
this.gen = gen;
}
public void Dispose()
{
this.Return();
}
}

This type encapsulates the IL output stream and uses two phantom type variables to encode the values at the top of the execution stack. The T parameter represents the top of the stack, and the R parameter represents the rest of the stack.

A few more types are needed to safely represent other reflected objects used during code generation:
// Represents the 'void' type used in some functions
public struct Unit
{
}
// Represents a named function with a known type structure T→R.
// This could be a void→void, an int→string, etc.
public struct Function<T, R>
{
internal MethodInfo method;
public Function(Action<T> f)
{
this.method = f.Method;
}
}

Here's a test program demonstrating the use:

class Program
{
static void Main(string[] args)
{
var d = new DynamicMethod("test", typeof(void), null);
using (var s = new Stack<Unit, Unit>(d.GetILGenerator()))
{ // equivalent to: () => Console.WriteLine(1 + 2);
s.Int(1)
.Int(2)
.Add()
.Apply(new Function<int, Unit>(Console.WriteLine));
}
d.Invoke(null, null);
Console.ReadLine();
}
}

Next I define a set of extension methods operating on typed stack objects, all parameterized by the underlying types. You can see how the top elements of the stack are specified, and how each operation consumes these elements or otherwise modifies the stack:
public static class CodeGen
{
// Apply a function to its arguments; note that function args and arity
// is checked at compile-time
public static Stack<R, R0>
Apply<R, T, R0>(this Stack<R, T> stack, Function<T, R0> target)
{
stack.gen.EmitCall(OpCodes.Call, target.method, null);
return new Stack<R, R0>(stack.gen);
}
// Embed an literal integer in the IL stream
public static Stack<Stack<R, T>, int>
Int<R, T>(this Stack<R, T> stack, int i)
{
stack.gen.Emit(OpCodes.Ldc_I4, i);
return new Stack<Stack<R, T>, int>(stack.gen);
}
// Add the two integers at the top of the execution stack
public static Stack<R, T>
Add<R, T>(this Stack<Stack<R, T>, T> stack)
{
stack.gen.Emit(OpCodes.Add);
return new Stack<R, T>(stack.gen);
}
// Return from the current function
public static Stack<R, T>
Return<R, T>(this Stack<R, T> stack)
{
stack.gen.Emit(OpCodes.Ret);
return new Stack<R, T>(stack.gen);
}
}

This is obviously a fairly limited set of functionality, and it would require a great deal more work and thought to properly abstract the full execution of CIL (especially exceptions!), but it just might be possible. A large subset is certainly possible.

It might serve as an interesting alternative to System.Linq.Expressions for some code generation applications, since Linq expressions are largely untyped. Here are a few more opcode examples for reference purposes:

// Duplicate the top element
public static Stack<Stack<R, T>, T>
Dup<R, T>(this Stack<R, T> stack)
{
stack.gen.Emit(OpCodes.Dup);
return new Stack<Stack<R, T>, T>(stack.gen);
} // index into an array
public static Stack<R, T>
LoadArrayIndex<R, T>(this Stack<R, Stack<T[], int>> stack)
{
stack.gen.Emit(OpCodes.Ldelem, typeof(T));
return new Stack<R, T>(stack.gen);
}
// runtime type test
public static Stack<R, T>
IsOfType<R, T, T>(this Stack<R, T> stack)
where T : class
{
stack.gen.Emit(OpCodes.Isinst, typeof(T));
return new Stack<R, T>(stack.gen);
}

Pasting all of the code in this post into a file and compiling it should just work.

Wrapping all CIL opcodes this way could be quite useful, since you're guaranteed to have a working program if your code type checks. If you have a language, embedded or otherwise, using this sort of technique might be a good way to achieve a type-safe JIT for .NET, assuming you can assign a stack-based execution semantics. Since you're compiling to .NET, it will need exactly such a stack semantics anyway!

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen...