Jeroen Frijters helpfully pointed out that the CLR implements some hard limits on nesting generics, which is 99 for .NET 4 based on my tests. My previous implementation of ThreadScoped<T> was thus limited to 99 instances. Not very useful!

The solution is actually quite simple, which I briefly outlined on Jeroen's blog: add more type parameters and use a simple base-99 counting scheme to generate new instances. Each additional type parameters thus increases the permutations 99 fold. One type index parameter yields 99^{1} instances, two type index parameters yields 99^{2} instances, three type index parameters yields 99^{3}, and so on.

No one in the foreseeable future will require more than 99^{3}, which is almost a million thread-local variables, so I've added two more type index parameters to make Ref<T0, T1, T2>. The instance allocation function is now:

internal override ThreadScoped<T> Allocate() { // If 'next' is null, we are at the end of the list of free refs, // so allocate a new one and enqueue it, then return 'this' var x = next; if (x != null) return this; // The CLR has some fundamental limits on generic nesting depths, so we circumvent // this by using two generic parameters, and nesting them via counting. x = Interlocked.CompareExchange(ref next, CreateNext(), null); // atomic swap failure doesn't matter, since the caller of Acquire() // accesses whatever instance is at this.next return this; }and CreateNext is:

ThreadScoped<T> CreateNext() { var x = allocCount + 1; if (x % (99 * 99) == 0) return new Ref<T, T, Ref<T2>> { allocCount = x }; if (x % 99 == 0) return new Ref<T, Ref<T1>, T2> { allocCount = x }; return new Ref<Ref<T0>, T1, T2> { allocCount = x }; }This is simple base-99 arithmetic. Anyone familiar with arithmetic should recognize the pattern here: when we get to certain multiples of 99, we reset the previous digits and carry the 1 to the next slot. Normally, humans deal in base-10, so a carry happens at 10

^{1}, 10

^{2}, 10

^{3}, and so on.

In this case, we are dealing with base-99, so carries happen at 99^{1}, 99^{2} and 99^{3}, and the "carry" operation consists of nesting a generic type parameter. Simple!

This scheme is also trivially extensible to as many additional parameters as is needed, so if someone somewhere really does need more than a million fast thread-local variables, I have you covered.

These changes don't seem to have impacted the performance of ThreadScoped<T>, so I'm still over 250% faster than the ThreadLocal<T> provided in .NET 4's base class libraries.