Skip to main content


Showing posts from 2008

Compact, Declarative Serialization

A few posts back , I hinted at using parameterization as an alternative to metadata. I've just written a serialization interface using this technique, and a binary serializer/deserializer to demonstrate the inherent tradeoffs. You can inspect the pair of interfaces required, and I implemented a binary serializer/deserializer pair as an example. ICompactSerializable is implemented for each serializable object. It's essentially a declarative method, which describes the sequential, internal structure of the object. It's simple and fast, since it provides native speed access to an object's fields without reflection, and no need for metadata. Of course, the obvious downside is that clients must describe the internal structure themselves via ICompactSerializer , and refactoring must be careful about reordering the sequence of calls. The upshot is that serialization and deserialization is insanely fast as compared to ordinary reflection-driven serialization, the binary is f

The Chicken and the Egg - Redux

There is still considerable skepticism regarding my conclusion that the chicken comes first . Many of the objections are simply due to different interpretations of the question, interpretations which I consider unfaithful to the original purpose of the chicken-egg dilemma . Fundamentally, this question is supposed to represent a causality dilemma. When there is a causal dependency between any two objects, A and B, we can ask ourselves, "which came first, A or B?" An object C could have caused B, and thus triggered the recursive relation C→B→A→B→A... Of course, it could just as easily have been A that was caused first. To properly answer this question, we must reduce the abstract objects A and B to concrete objects and apply our scientific knowledge to ground the recursion. Using chickens and eggs as our objects, we have to precisely define what chickens and what eggs to consider. The question "which came first, chickens or any type of egg?" is not a causal dilemma a

Embedded Stack Language for .NET

Parametric polymorphism is a powerful tool for constructing and composing safe programs. To demonstrate this, I've constructed a tiny embedded stack language in C#, and I exploited C# generics, aka parametric polymorphism, to ensure that any embedded programs are type-safe, and thus, "can't go wrong". Moreover, this language is jitted, since I use ILGenerator of System.Reflection.Emit, and the complexity of doing this was no higher than creating an interpreter. This is mainly because the CLR is itself a stack-based VM, and so jitting a stack-based program to a stack-based IL is fairly natural. Here is the type-safe representation of the stack: public struct Stack<R, T> : IDisposable { internal ILGenerator gen; public Stack(ILGenerator gen) { this.gen = gen; } public void Dispose() { this.Return(); } } This type encapsulates the IL output stream and uses two phantom type variables to encode the values at the top of the

Reflection, Attributes and Parameterization

I used to be a big fan of reflection, and C#'s attributes also looked like a significant enhancement. Attributes provide a declarative way to attach metadata to fields, methods, and classes, and this metadata is often used during reflection. The more I learned about functional programming, type systems, and so on, the more I came to realize that reflection isn't all it's cracked up to be. Consider .NET serialization. You can annotate fields you don't want serialized with the attribute [field:NonSerialized]. However, metadata is just data, and every usage of attributes can be replaced with a pair of interfaces. Using [field:NonSerialized] as an example, we can translate this class: class Foo { [field:NonSerialized] object bar; } Into one like this: // these two interfaces take the place of a NonSerializableAttribute declaration interface INonSerialized { void Field<T>(ref T field); } interface IUnserializableMembers { void Unserializable(INonSerialized s);

Garbage Collection Representations Continued

In a previous post , I discussed representations used for polymorphic parameters. These special data representations are a contract between the language and the runtime which enables the runtime to traverse data structures to collect garbage. 64-Bit Polymorphic Representation I had neglected to mention this interesting variant of tagged representations that I had come across at one point, but which I believe is not in widespread use. This representation uses full 64-bit floating point representations for all polymorphic values , and encodes non-floating point data in the unused bits of NaN values. IEEE 754 floating point numbers reserve 12 of the 64 bits for flagging NaN values, so the remaining bits are free to use for encoding integers, pointers, and so on. I haven't found very much empirical data on this approach, but it looks promising. The polymorphic representation is doubled in size, but the structure is unboxed. Thus, the polymorphic function call overhead may be slightly h

SysCache build for NHibernate 2.0.1GA

NHibernate 2.0.1GA is the latest binary download available, but it seems the NHibernate.Caches.SysCache binary release is lagging behind, as the download at Sourceforge was built against NHibernate 2.0.0. Here's a version of NHibernate.Caches.SysCache built against 2.0.1GA.

Dispatching: VTables vs. Runtime Tests and Casts

Object oriented languages like C# and Java use method dispatching techniques to select a concrete method to invoke when given a polymorphic type. By far the most common technique is the virtual method table , or vtable for short. Long ago I had updated the above Wikipedia page with information regarding alternative dispatch techniques. Research published a few years back indicated that dispatching through a vtable incurred astonishing overheads, as high as 50% of total execution time. Alternative dispatch techniques based on runtime tests, such as a linear sequence of if statements checking for the various concrete class types, or a sequence of nested if statements forming a binary search, were often more efficient on a variety of hardware. Vtables are rather flexible however, and the composition of a number of vtables can encode a variety of advanced dispatching behaviour. The .NET CLR utilizes vtables, and I've sketched out rough encodings of first-class functions, algebraic data

Tagless Interpreters in C#

Creating DSLs is all the rage these days, and for good reason. Most abstractions are actually a little language struggling to get out. Design consists of creating abstractions with maximum power, and minimum restrictions, and reusing this abstraction as much as possible. Small domain-specific languages are the ticket. However, language implementations are often written in fairly ad-hoc ways, and most interpreters are difficult to extend to compilers and partial evaluators. Languages are usually described by an "initial algebra", which basically just means that language terms are described by data types. Here's a simple definition of a language with integers, variables and addition: (* * Expressions := [0-9]* | e1 + e2 | e1 - e2 *) type expression = Int of int | Add of expression * expression | Sub of expression * expression So a program in this language can consist of integers, subtraction or addition expressions. The interpreter for this language unpacks the expression

Mobile Continuations for the Web

A continuation is basically the state of a program at the time the continuation was captured. For a common example, consider a blocking system call for any operating system. If the system call blocks the process, the continuation of that process is placed on a blocked queue until the system call completes. When the operation completes, the continuation is removed from the queue and resumed with the result of the system call. There has been plenty of well-deserved hoopla around continuations for web interaction . Navigating a website is a great deal like operating on a program's continuations. A page generated from a server-side program contains references to a number of program continuations in its links and forms. Clicking a link or submitting a form invokes the captured continuation which the server then executes. The result is yet another set of continuations on the resulting page, ad infinitum. Unfortunately, most continuation frameworks are designed around server-side continu

Garbage Collection Representations

I have yet to find a comprehensive description of the various data representations used for garbage collection. This isn't an overview of garbage collection algorithms, but an overview of how to distinguish pointers from other values and how to traverse the data structures, which has seemingly gotten only a short riff in the literature. I aim to give an in-depth overview of the techniques I've come across or managed to puzzle out on my own. Boxing: all values are allocated on the heap, so basically everything is a pointer, thus, there is no ambiguity between pointers and values. Tagged pointers: pointers are distinguished from ordinary values by some sort of tag. This information is often encoded in the pointer or value itself for a very compact representation. Type-passing: each parameter of a function is associated with a set of operations, usually referred to as a "dictionary", which is similar to an interface in object-oriented languages. This dictionary defin

Coroutines in C Redux

My last post generated a bit of discussion here and on Reddit , where after further testing, I discovered that stack management didn't work properly on Windows. It would seem that embedded frame pointers on the stack are used to ensure that we are on the same stack, at least in DEBUG mode, and so growing or shrinking the stack generated cryptic errors. These problems motivated me to explore alternate methods of implementing coroutines. Fundamentally, the current state of a computation is encapsulated in the stack. So switching between two concurrent computations means modifying the stack, either by swapping out the old stack for an entirely new one, or by copying the relevant portion of the stack to a save area, and restoring it when switching back. Coroutines and threads typically use stack switching, and continuations have traditionally used some sort of copying. The initial coroutine implementation used stack switching, which is fairly efficient incurring only a register save,

Coroutines in C

I've just uploaded a functional coroutine library for C, called libconcurrency . It's available under the LGPL. I think it's the most complete, flexible and simplest coroutine implementation I've seen, so hopefully it will find some use. Next on the todo list are some more rigourous tests of the corner cases, and then extending libconcurrency to scale across CPUs. This will make it the C equivalent of Manticore for ML. There is a rich opportunity for scalable concurrency in C. Of course, I only built this library to serve as a core component of a virtual machine I'm building, and that's all I'm going to say about that. ;-)

The Chicken and the Egg - An Inductive Analysis

In a slight departure from my usual computer science focused musings, I'm going to analyze another logical conundrum that has raged for centuries. Which came first, the chicken or the egg? One of my many online debates clued me into the fact that there is a widespread belief that the egg came first. I even found a "paper" providing an in-depth analysis concluding the same. Unfortunately, the analysis appears flawed. An e-mail to the author bounced, so I figured I might as well post my brief analysis here. A simple inductive analysis suffices to conclusively determine the causal chain. Base case: single celled organism, asexual reproduction via mitosis (given our current knowledge) n implies n+1: species n produces, by its own reproduction mechanism R n , an offspring n+1 with a slightly different reproduction mechanism, R n+1 . Conclusion: the "chicken" came first. In this case, our "chickens" were not hatched from "chicken eggs", but wer

Object/Relational Mapping and Factories

My day job is has stuck me with C#, but I try to make the best of it, as evidenced by my FP# and Sasa C# libraries. One thing that still gets in my way more than it should is O/R mapping. No other mapper I've come across encourages a true object-oriented application structure. Granted, I've only really used NHibernate, and I had built my own mapper before that was even available, but I've read up quite a bit on the the other mappers. By true OO structure, I mean that all application objects are only constructed from other application objects, which doesn't involve dependencies on environment-specific code (ie. if you're running under ASP.NET, Windows forms, Swing, etc.). A pure structure encourages a proper separation between core application code, and display and controller code, which allows more flexible application evolution. Instead, controller logic often manually constructs application objects, passing in default arguments to properly initialize the required

Permutations with Duplicates in C

Calculating permutations has some fairly nifty algorithms out there. I recently ran into a permutation problem for which I couldn't find an existing algorithm. I admit that I didn't look too hard though. Basically, I needed the permutations of the elements of a set of size N over K slots. However, the permutations should include duplicate elements from the set, as K > N is valid configuration. This corresponds to N K permutations. Most algorithms I found did not permit duplicate elements. As an example of an application for such a permutation algorithm, imagine the set of all function signatures of arity K-1 over N types. This corresponds to K slots with N possible choices for each slot. I devised a fairly simple implementation of such a permutation algorithm. Essentially, N forms the base of an arbitrary-precision integer of size K. In other words, we have an array of elements with a maximum of N which index our set. To permute, we simply increment the first element and pr

Blue-eyed Islander Puzzle - an analysis

Many people find themselves stumped by the so-called Blue-Eyed Islanders puzzle . There is also much controversy over its supposed solution. I'm going to analyze the problem and the solution, and in the process, explain why the solution works. To begin, let's modify the problem slightly and say that there's only 1 blue-eyed islander. When the foreigner makes his pronouncement, the blue-eyed islander looks around and sees no other blue eyes, and being logical, correctly deduces that his own eyes must be blue in order for the foreigner's statement to make sense. The lone blue-eyed islander thus commits suicide the following day at noon. Now comes the tricky part, and the source of much confusion. Let's say there are 2 blue-eyed islanders, Mort and Bob. When the foreigner makes his pronouncement, Mort and Bob look around and see only each other. Mort and Bob thus both temporarily assume that the other will commit suicide the following day at noon. Imagine their chagrin

An Almost Type-Safe General Monad in C#, aka how to Abstract over Type Constructors using Dynamics

Extending the work in my last post , I've developed a way to express an almost type-safe, general monad in C# . Similar to my module translation, the single monad object becomes a pair of co-operating objects, only one of which the monad implementor must define. Since C# cannot abstract over type constructors, I had to exploit the only feature that could accomodate the flexibility I needed: C#'s dynamic typing. // The Monad object, indexed by a singleton type that implements the // monad operations. public sealed class Monad<M, T> where M : struct, IMonadOps<M> { ... } // An object that implements operations on the monad's encapsulated // state. public interface IMonadOps<M> where M : struct, IMonadOps<M> { /// Return the encapsulated state for the monad's zero value. object Zero<T>(); // Return the encapsulated state for the 'unit' operation. object Unit<T>(T t); // Perform a bind operation given th

The Worst Monad Tutorial... Except For All Those Others.

I've found other monad tutorials very frustrating. They are typically written in expressive languages with type inference, which permits concise descriptions, but obscures the underlying type structure. I've been struggling with writing something close to a monad in C# for quite some time, simply because none of these tutorials give a sufficiently complete description of a monad's structure. Suprisingly, the Wikipedia page on monads helped clarify what I was missing. Here is the general structure of a monad all these tutorials use : -- the type of monad m type m a = ... -- return is a type constructor that creates monad instances return :: a → m a -- bind is a function that combines a monad instance m a with a -- computation that produces another monad instance m b from a's -- to produce a new monad instance m b bind :: m a → (a → m b) → m b So the monad type 'm', has a function 'return' that constructs instances of that type, and 'bind' which c

Towards the best collection API... in C#. And some partial applications too.

The venerable Oleg Kiselyov once posted about the "best" collection traversal API . Let's call this ideal iterator a "SuperFold". LTU also covered his article . Essentially, a SuperFold is a left fold with early termination support. Any cursor can then be automatically derived from the SuperFold. The converse is not true. Additional arguments are made in the above paper and in the LTU thread, so I won't repeat them here. Without further ado, I present the SuperFold for my purely functional list in C# : //OCaml signature: ((T → B → bool * B) → B → B) → (T → B → bool * B) → B → B B SuperFold<B>( Fun<Fun<T, B, Pair<bool, B>>, B, B> self, Fun<T, B, Pair<bool, B>> proc, B seed) { bool cont; proc(head, seed).Bind(out cont, out seed); return cont ? self(proc, seed) : seed; } While quite simple, it's not as efficient as it should be since C#/.NET doesn't support proper tail calls. You can see in that source fil

ML Modules in C# - Sorely Missing Polymorphic Type Constructors

As Chung-chieh Shan pointed out , my encoding of modules in C# is somewhat limited. In particular, I cannot abstract over type constructors, which is to say, C# is missing generics over generics. Consider the Orc.NET interpreter: class Orc { class Exp<T> { ... } public Exp<U> Seq<T,U>(Exp<T> e1, <U>) { ... } public Exp<T> Par<T>(Exp<T> e1, Exp<T>) { ... } public Exp<T> Where<T>(Exp<T> e1, Exp<Promise<T>>) { ... } } This is the result of my translation, which was necessitated by the "Where" method. Where introduces a dependency which currently cannot be expressed with ordinary C# constraints, so the module encoding is necessary. The above interface is a direct, faithful implementation of the Orc semantics. The implementation I currently have is an interpreter for those semantics. What if I want to provide a compiler instead? The interface should remain the same , but the implementation