Skip to main content

ThreadScoped<T> The Next Generation

Jeroen Frijters helpfully pointed out that the CLR implements some hard limits on nesting generics, which is 99 for .NET 4 based on my tests. My previous implementation of ThreadScoped<T> was thus limited to 99 instances. Not very useful!

The solution is actually quite simple, which I briefly outlined on Jeroen's blog: add more type parameters and use a simple base-99 counting scheme to generate new instances. Each additional type parameters thus increases the permutations 99 fold. One type index parameter yields 991 instances, two type index parameters yields 992 instances, three type index parameters yields 993, and so on.

No one in the foreseeable future will require more than 993, which is almost a million thread-local variables, so I've added two more type index parameters to make Ref<T0, T1, T2>. The instance allocation function is now:

internal override ThreadScoped<T> Allocate()
{
    // If 'next' is null, we are at the end of the list of free refs,
    // so allocate a new one and enqueue it, then return 'this'
    var x = next;
    if (x != null) return this;
    // The CLR has some fundamental limits on generic nesting depths, so we circumvent
    // this by using two generic parameters, and nesting them via counting.
    x = Interlocked.CompareExchange(ref next, CreateNext(), null);
    // atomic swap failure doesn't matter, since the caller of Acquire()
    // accesses whatever instance is at this.next
    return this;
}
and CreateNext is:
ThreadScoped<T> CreateNext()
{
    var x = allocCount + 1;
    if (x % (99 * 99) == 0) return new Ref<T, T, Ref<T2>> { allocCount = x };
    if (x % 99 == 0)        return new Ref<T, Ref<T1>, T2> { allocCount = x };
    return new Ref<Ref<T0>, T1, T2> { allocCount = x };
}
This is simple base-99 arithmetic. Anyone familiar with arithmetic should recognize the pattern here: when we get to certain multiples of 99, we reset the previous digits and carry the 1 to the next slot. Normally, humans deal in base-10, so a carry happens at 101, 102, 103, and so on.

In this case, we are dealing with base-99, so carries happen at 991, 992 and 993, and the "carry" operation consists of nesting a generic type parameter. Simple!

This scheme is also trivially extensible to as many additional parameters as is needed, so if someone somewhere really does need more than a million fast thread-local variables, I have you covered.

These changes don't seem to have impacted the performance of ThreadScoped<T>, so I'm still over 250% faster than the ThreadLocal<T> provided in .NET 4's base class libraries.

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important. Why Reverse Mode Automatic Differentiation? As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary! Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type: R → R N That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario: R N → R That is, functions of many parameters t...