Skip to main content

CLR: The Cost of Dynamic Type Tests

I recently came across Vance Morrison's blog post on the relative costs of dynamic type tests on the CLR, and I was struck by how much my experience differed from the results he received. In past tests, I had concluded that operations and comparisons using System.Type were just as fast as operations on System.RuntimeTypeHandle. This struck me as a little odd at the time, but numbers don't lie.

Vance helpfully provided the code he used for his benchmarks, so I decided to see if perhaps I was mistaken. Lo and behold, the numbers I received from running his code matched my past results, ie. RuntimeTypeHandle provided no advantage. This seemed extremely odd, and after digging a little deeper, it turns out that Vance and I are both right. I've been doing most of my development on 64-bit x64 machines, and I suspect Vance was running x86 at the time. It turns out that the x64 runtime for the CLR is woefully underperforming when compared to x86, at least for this type of code. I knew there was a performance difference between the two, but the x86 backend really shines in dynamic type tests, demonstrating the CLR team's heavy investment into x86 optimizations.

The Original Tests

I'll first present the results of the original benchmarks, this time compiled for x64, x86, and Any CPU:

x64

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   15.962 +- 4%    msec
typeof(string).TypeHandle            : count: 500000   17.326 +- 3%    msec
anObj.GetType() == type              : count: 500000   17.960 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   16.329 +- 3%    msec
anObj.GetType() == typeof(string)    : count: 500000    1.858 +- 17%   msec
(anObj is string)                    : count: 500000    4.436 +- 8%    msec

These results differ significantly from Vance's original data. The type handle test was very nearly the slowest way to perform dynamic type tests, not the fastest as Vance concluded. The fastest were direct comparisons of System.Type, and subtype tests via the 'is' operator.

Any CPU

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   16.269 +- 5%    msec
typeof(string).TypeHandle            : count: 500000   17.353 +- 4%    msec
anObj.GetType() == type              : count: 500000   18.111 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   17.266 +- 2%    msec
anObj.GetType() == typeof(string)    : count: 500000    1.804 +- 3%    msec
(anObj is string)                    : count: 500000    4.487 +- 4%    msec

As expected, the results for the "Any CPU" build match the x64. I'm running a 64-bit OS, so the x64 runtime is the default launched when running a platform agnostic program.

x86

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   15.417 +- 2%    msec
typeof(string).TypeHandle            : count: 500000    0.830 +- 14%   msec
anObj.GetType() == type              : count: 500000   20.976 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   36.082 +- 2%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.007 +- 12%   msec
(anObj is string)                    : count: 500000    4.893 +- 6%    msec

Here we see results much closer to Vance's original data. Type handles are blazingly fast, and the JIT seems to recognize certain comparison patterns and optimizes them heavily.

The Generic Tests

Vance's benchmarks only compared the costs of hard-coded type tests, where the JIT has better visibility on the static types involved and can perhaps optimize more readily. This doesn't necessarily translate to code that performs dynamic type tests on generic variables, so I duplicated Vance's test suite to operate on an abstract type T. The numbers were once again quite surprising. The fastest operations when the static types are hard-coded, became the slowest when operating on an abstract type parameter. The slowest tests became just a little slower when operating on type parameters, but not dramatically so.

x64

typeof(T)                            : count: 500000   25.621 +- 2%    msec
typeof(T).TypeHandle                 : count: 500000   28.043 +- 2%    msec
anObj.GetType() == type              : count: 500000   18.727 +- 1%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   18.950 +- 4%    msec
anObj.GetType() == typeof(T)         : count: 500000   39.888 +- 2%    msec
(anObj is T)                         : count: 500000   33.884 +- 2%    msec

As I mentioned, the fastest x64 type tests became the slowest tests when operating on generic type parameters. The other type tests slowed down slightly as well, not nowhere near as dramatic a fall. Still, my past results remain: comparing System.Type instances seems to be the most efficient, and stable way to compare dynamic types, at least for x64.

Any CPU

typeof(T)                            : count: 500000   25.234 +- 3%    msec
typeof(T).TypeHandle                 : count: 500000   29.123 +- 3%    msec
anObj.GetType() == type              : count: 500000   18.699 +- 2%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   22.600 +- 5%    msec
anObj.GetType() == typeof(T)         : count: 500000   40.124 +- 2%    msec
(anObj is T)                         : count: 500000   32.301 +- 1%    msec

As expected, the "Any CPU" results match x64. Nothing surprising here.

x86

typeof(T)                            : count: 500000   22.395 +- 3%    msec
typeof(T).TypeHandle                 : count: 500000    4.478 +- 2%    msec
anObj.GetType() == type              : count: 500000   21.170 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   36.064 +- 2%    msec
anObj.GetType() == typeof(T)         : count: 500000   36.878 +- 1%    msec
(anObj is T)                         : count: 500000   45.831 +- 4%    msec

Here we partially confirm Vance's data again, whereby typeof(T).TypeHandle is incredibly fast on x86, but nearly 5 times slower than the non-generic test (which seems odd considering it should just be one or two memory fetches, so a 5x slowdown seems excessive).

However, the second and third fastest ways to perform dynamic type tests on x86 when static types are known, became the slowest when operating on generic parameters. This performance degradation matched the trend seen on x64, so at least that's consistent. The standard operations on System.Type are roughly the same, so their performance was much more stable.

Conclusions

I'm not sure we can draw any reliable conclusions from this data as to the fastest way to perform dynamic type tests, at least across platforms. Using RuntimeTypeHandles should be the fastest, but their poor showing on x64 makes me question whether it's worth it. Hopefully the CLR team will put some more effort into optimizing the x64 code generator to improve the performance of RuntimeTypeHandle. As it currently stands though, operating on System.Type seems like the most stable across all the platforms, and more importantly, it doesn't degrade in the presence of generics.

You can download the code for the modified benchmark suite here. All credit to Vance for the code, I just copied and pasted the tests and modified them slightly to operate on type parameters. The x86 and x64 builds are selected via different build configurations. Release mode builds for Any CPU.

Edit: The results above target .NET 2.0, which largely shares the same runtime as .NET 3.0 and 3.5, and the results are the same all the way up to .NET 4.0. The .NET 4.0 VM allegedly underwent many changes, and after performing a few preliminary tests, I can say that the above numbers are all completely different. RuntimeTypeHandle is now the slowest test on x86, and x64 is way faster than x86 on pretty much all the tests. System.Type is stable and fastest even for generics. So the conclusion seems pretty obvious now: stick to System.Type and avoid RuntimeTypeHandle at all costs.

x64 - .NET 4.0

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000    8.525 +- 6%    msec
typeof(string).TypeHandle            : count: 500000   19.892 +- 4%    msec
anObj.GetType() == type              : count: 500000    4.792 +- 4%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   47.924 +- 3%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.563 +- 8%    msec
(anObj is string)                    : count: 500000    2.439 +- 8%    msec

typeof(T)                            : count: 500000   25.557 +- 2%    msec
typeof(T).TypeHandle                 : count: 500000   44.622 +- 3%    msec
anObj.GetType() == type              : count: 500000    4.310 +- 4%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   47.437 +- 1%    msec
anObj.GetType() == typeof(T)         : count: 500000   23.605 +- 4%    msec
(anObj is T)                         : count: 500000   39.682 +- 3%    msec

x86 - .NET 4.0

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   14.871 +- 3%    msec
typeof(string).TypeHandle            : count: 500000   36.175 +- 4%    msec
anObj.GetType() == type              : count: 500000   25.566 +- 1%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   49.741 +- 4%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.620 +- 10%   msec
(anObj is string)                    : count: 500000    5.649 +- 9%    msec

typeof(T)                            : count: 500000   17.096 +- 11%   msec
typeof(T).TypeHandle                 : count: 500000   44.402 +- 2%    msec
anObj.GetType() == type              : count: 500000   25.086 +- 2%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   48.827 +- 3%    msec
anObj.GetType() == typeof(T)         : count: 500000   41.598 +- 3%    msec
(anObj is T)                         : count: 500000   45.505 +- 1%    msec

As you can see from these updated results for .NET 4.0, the x64 VM is now comparable to the x86 VM, largely because the x86 VM appears to be slower than in .NET 2.0. The advantage of RuntimeTypeHandle in .NET 2.0 is completely gone, and it's now (surprisingly) the slowest means of comparing runtime types. Comparing instances of System.Type appears to be the fastest all around, and doesn't degrade if you're comparing generic type parameters.

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu...

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen...