### Garbage Collection Representations Continued

In a previous post, I discussed representations used for polymorphic parameters. These special data representations are a contract between the language and the runtime which enables the runtime to traverse data structures to collect garbage.

#### 64-Bit Polymorphic Representation

I had neglected to mention this interesting variant of tagged representations that I had come across at one point, but which I believe is not in widespread use. This representation uses full 64-bit floating point representations for all polymorphic values, and encodes non-floating point data in the unused bits of NaN values. IEEE 754 floating point numbers reserve 12 of the 64 bits for flagging NaN values, so the remaining bits are free to use for encoding integers, pointers, and so on.

I haven't found very much empirical data on this approach, but it looks promising. The polymorphic representation is doubled in size, but the structure is unboxed. Thus, the polymorphic function call overhead may be slightly higher, but garbage collector pressure is reduced since less boxing is needed. Numerical code is likely to see the biggest speedups.

Also, this representation can be used to ensure that all values are properly 8-byte aligned, which is often an efficiency gain in modern memory systems. This also has the advantage of permitting unboxed floating point values, which is a significant slowdown for the other tagged representations.

A new representation I recently came across is being used in the SML# language. Consider a typical tagged representation where 1-bit is co-opted to distinguish integers from pointers. Each polymorphic parameter must carry this 1 bit.

But there is no reason for this 1 bit to be contained within the parameter itself. Consider lifting this 1 bit to an extra function parameter, so:
`val f: 'a → 'b → 'c`

becomes:
`val f: 'a * Bool → 'b * Bool → 'c * Bool`

The Bool is the flag indicating whether the parameter is a pointer or an integer. But using a full function parameter to hold a single bit is a waste, so let's coalesce all the Bool parameters into a single Bitmask:
`val f: Bitmask → 'a → 'b → 'c`

Bitmask can be an arbitrarily long sequence of bits, and each bit in the mask corresponds to the bit that would normally be in the pointer/integer representation of the corresponding parameter in a typical tagged representation. By lifting the tag bit to an external parameter, we can now use full natural representations for all parameters.

Calling a polymorphic function now consists of masking and shifting to extract the bit for the given parameter, or assigning one if calling from a monomorphic function.

Bitmasks must also accompany polymorphic data structures. The presence of any polymorphic type variable implies the requirement for a bitmask. It's also not clear how expensive this shifting is in relation to inline tag bits. SML# does not yet have a native code compiler, so any comparisons to other SML compilers aren't representative.

However, the bitmask representation does not unbox floating point numbers, but a hybrid scheme with the above 64-bit representation is possible.

Consider the lifted 1-bit to indicate whether the parameter is word-sized (1), or larger (0). Larger parameters require a full 64-bit representation, while word-sized parameters do not. The garbage collector thus skips any parameters of bitmask tag value 1 (since they are unboxed integers), and analyzes any values of bitmask tag value 0 as a 64-bit value. If the 64-bit is a NaN, the value is further analyzed to extract a single distinguished bit indicating whether it is a pointer. If it is not a pointer, it is skipped (since it is an unboxed floating point value). If it is a pointer, the structure pointed to is traversed.

This last scheme provides natural representations for integers and floating point numbers, and expands the representation of pointers in polymorphic functions by one word. Other tradeoffs are certainly possible.

Ben L. Titzer said…
In my new Virgil compiler, I use a whole program analysis which computes all possible polymorphic instantiations (without actually instantiating them) and detects polymorphic recursion. The results of the analysis can then be used to pick polymorphic representations for different parts of the program depending on size/speed tradeoffs. That might be a bit orthogonal to your idea, but could complement it. Virgil doesn't have wildcards, so this makes it a bit simpler.
Ben L. Titzer said…
I'd also like to point out that some polymorphic functions need more information than a single bit to distinguish between pointers and non-pointers. E.g. a polymorphic function that allocates an array may need to know the size of the elements.
Sandro Magi said…
I use a whole program analysis which computes all possible polymorphic instantiations

Of course, these representations are only strictly necessary in the presence of separate compilation where the sizes cannot be known in advance.

E.g. a polymorphic function that allocates an array may need to know the size of the elements.

I was considering the array case over lunch. Two preliminary solutions:

1. All polymorphic types, including arrays, must use the uniform polymorphic representation (64-bits in the first case, 32-bits in the second); anything larger is boxed.

2. All packed types, like arrays, tuples, vectors, must be handled specially by the language (so it can insert and manage these special headers for them); in this case, it's probably best to try and unify all such packed types.

Of course, there are other options as well, like record polymorphism as used in SML#, but I think I covered those in my last post (though you're right that it has bearing on the second encoding of this post).

There are other designs as well, such as using 2 bits to provide more precision, such as 00=int, 01=pointer, 10=structure of ints, 11=structure containing pointers.
Ben L. Titzer said…
I think you're eventually going to end up needing to pass a representation of the polymorphic element type in the worst case anyway. E.g. the array allocated by this polymorphic function might need to carry its complete type information, which may be used for type casts if the language supports variance over array types. Virgil's mutable arrays are invariantly typed, but it will soon have immutable array types that will be covariant.

### async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C!Features:It's 100% portable C.It requires very little state (2 bytes).It's not dependent on an OS.It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved.#include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground breaking. I've mad…

### Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string".This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API:It does not support disjunctions, ie. "OR" clauses.It does not support filtering using inequalities on more than one property.It does not support a not-equal operation.So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR").Implementations of this…

### Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading, this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important.Why Reverse Mode Automatic Differentiation?As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary!Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type:RRNThat is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario:RN → RThat is, functions of many parameters that return a single real …