ThreadScoped<T> The Next Generation

Jeroen Frijters helpfully pointed out that the CLR implements some hard limits on nesting generics, which is 99 for .NET 4 based on my tests. My previous implementation of ThreadScoped<T> was thus limited to 99 instances. Not very useful!

The solution is actually quite simple, which I briefly outlined on Jeroen's blog: add more type parameters and use a simple base-99 counting scheme to generate new instances. Each additional type parameters thus increases the permutations 99 fold. One type index parameter yields 99¹ instances, two type index parameters yields 99² instances, three type index parameters yields 99³, and so on.

No one in the foreseeable future will require more than 99³, which is almost a million thread-local variables, so I've added two more type index parameters to make Ref<T0, T1, T2>. The instance allocation function is now:

internal override ThreadScoped<T> Allocate()
{
    // If 'next' is null, we are at the end of the list of free refs,
    // so allocate a new one and enqueue it, then return 'this'
    var x = next;
    if (x != null) return this;
    // The CLR has some fundamental limits on generic nesting depths, so we circumvent
    // this by using two generic parameters, and nesting them via counting.
    x = Interlocked.CompareExchange(ref next, CreateNext(), null);
    // atomic swap failure doesn't matter, since the caller of Acquire()
    // accesses whatever instance is at this.next
    return this;
}

and CreateNext is:

ThreadScoped<T> CreateNext()
{
    var x = allocCount + 1;
    if (x % (99 * 99) == 0) return new Ref<T, T, Ref<T2>> { allocCount = x };
    if (x % 99 == 0)        return new Ref<T, Ref<T1>, T2> { allocCount = x };
    return new Ref<Ref<T0>, T1, T2> { allocCount = x };
}

This is simple base-99 arithmetic. Anyone familiar with arithmetic should recognize the pattern here: when we get to certain multiples of 99, we reset the previous digits and carry the 1 to the next slot. Normally, humans deal in base-10, so a carry happens at 10¹, 10², 10³, and so on.

In this case, we are dealing with base-99, so carries happen at 99¹, 99² and 99³, and the "carry" operation consists of nesting a generic type parameter. Simple!

This scheme is also trivially extensible to as many additional parameters as is needed, so if someone somewhere really does need more than a million fast thread-local variables, I have you covered.

These changes don't seem to have impacted the performance of ThreadScoped<T>, so I'm still over 250% faster than the ThreadLocal<T> provided in .NET 4's base class libraries.

Higher Logics

Search This Blog

ThreadScoped<T> The Next Generation

Labels

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

Easy Automatic Differentiation in C#

Easy Reverse Mode Automatic Differentiation in C#