Sunday, September 22, 2013

CLR: The Cost of Dynamic Type Tests

I recently came across Vance Morrison's blog post on the relative costs of dynamic type tests on the CLR, and I was struck by how much my experience differed from the results he received. In past tests, I had concluded that operations and comparisons using System.Type were just as fast as operations on System.RuntimeTypeHandle. This struck me as a little odd at the time, but numbers don't lie.

Vance helpfully provided the code he used for his benchmarks, so I decided to see if perhaps I was mistaken. Lo and behold, the numbers I received from running his code matched my past results, ie. RuntimeTypeHandle provided no advantage. This seemed extremely odd, and after digging a little deeper, it turns out that Vance and I are both right. I've been doing most of my development on 64-bit x64 machines, and I suspect Vance was running x86 at the time. It turns out that the x64 runtime for the CLR is woefully underperforming when compared to x86, at least for this type of code. I knew there was a performance difference between the two, but the x86 backend really shines in dynamic type tests, demonstrating the CLR team's heavy investment into x86 optimizations.

The Original Tests

I'll first present the results of the original benchmarks, this time compiled for x64, x86, and Any CPU:

x64

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   15.962 +- 4%    msec
typeof(string).TypeHandle            : count: 500000   17.326 +- 3%    msec
anObj.GetType() == type              : count: 500000   17.960 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   16.329 +- 3%    msec
anObj.GetType() == typeof(string)    : count: 500000    1.858 +- 17%   msec
(anObj is string)                    : count: 500000    4.436 +- 8%    msec

These results differ significantly from Vance's original data. The type handle test was very nearly the slowest way to perform dynamic type tests, not the fastest as Vance concluded. The fastest were direct comparisons of System.Type, and subtype tests via the 'is' operator.

Any CPU

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   16.269 +- 5%    msec
typeof(string).TypeHandle            : count: 500000   17.353 +- 4%    msec
anObj.GetType() == type              : count: 500000   18.111 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   17.266 +- 2%    msec
anObj.GetType() == typeof(string)    : count: 500000    1.804 +- 3%    msec
(anObj is string)                    : count: 500000    4.487 +- 4%    msec

As expected, the results for the "Any CPU" build match the x64. I'm running a 64-bit OS, so the x64 runtime is the default launched when running a platform agnostic program.

x86

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   15.417 +- 2%    msec
typeof(string).TypeHandle            : count: 500000    0.830 +- 14%   msec
anObj.GetType() == type              : count: 500000   20.976 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   36.082 +- 2%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.007 +- 12%   msec
(anObj is string)                    : count: 500000    4.893 +- 6%    msec

Here we see results much closer to Vance's original data. Type handles are blazingly fast, and the JIT seems to recognize certain comparison patterns and optimizes them heavily.

The Generic Tests

Vance's benchmarks only compared the costs of hard-coded type tests, where the JIT has better visibility on the static types involved and can perhaps optimize more readily. This doesn't necessarily translate to code that performs dynamic type tests on generic variables, so I duplicated Vance's test suite to operate on an abstract type T. The numbers were once again quite surprising. The fastest operations when the static types are hard-coded, became the slowest when operating on an abstract type parameter. The slowest tests became just a little slower when operating on type parameters, but not dramatically so.

x64

typeof(T)                            : count: 500000   25.621 +- 2%    msec
typeof(T).TypeHandle                 : count: 500000   28.043 +- 2%    msec
anObj.GetType() == type              : count: 500000   18.727 +- 1%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   18.950 +- 4%    msec
anObj.GetType() == typeof(T)         : count: 500000   39.888 +- 2%    msec
(anObj is T)                         : count: 500000   33.884 +- 2%    msec

As I mentioned, the fastest x64 type tests became the slowest tests when operating on generic type parameters. The other type tests slowed down slightly as well, not nowhere near as dramatic a fall. Still, my past results remain: comparing System.Type instances seems to be the most efficient, and stable way to compare dynamic types, at least for x64.

Any CPU

typeof(T)                            : count: 500000   25.234 +- 3%    msec
typeof(T).TypeHandle                 : count: 500000   29.123 +- 3%    msec
anObj.GetType() == type              : count: 500000   18.699 +- 2%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   22.600 +- 5%    msec
anObj.GetType() == typeof(T)         : count: 500000   40.124 +- 2%    msec
(anObj is T)                         : count: 500000   32.301 +- 1%    msec

As expected, the "Any CPU" results match x64. Nothing surprising here.

x86

typeof(T)                            : count: 500000   22.395 +- 3%    msec
typeof(T).TypeHandle                 : count: 500000    4.478 +- 2%    msec
anObj.GetType() == type              : count: 500000   21.170 +- 3%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   36.064 +- 2%    msec
anObj.GetType() == typeof(T)         : count: 500000   36.878 +- 1%    msec
(anObj is T)                         : count: 500000   45.831 +- 4%    msec

Here we partially confirm Vance's data again, whereby typeof(T).TypeHandle is incredibly fast on x86, but nearly 5 times slower than the non-generic test (which seems odd considering it should just be one or two memory fetches, so a 5x slowdown seems excessive).

However, the second and third fastest ways to perform dynamic type tests on x86 when static types are known, became the slowest when operating on generic parameters. This performance degradation matched the trend seen on x64, so at least that's consistent. The standard operations on System.Type are roughly the same, so their performance was much more stable.

Conclusions

I'm not sure we can draw any reliable conclusions from this data as to the fastest way to perform dynamic type tests, at least across platforms. Using RuntimeTypeHandles should be the fastest, but their poor showing on x64 makes me question whether it's worth it. Hopefully the CLR team will put some more effort into optimizing the x64 code generator to improve the performance of RuntimeTypeHandle. As it currently stands though, operating on System.Type seems like the most stable across all the platforms, and more importantly, it doesn't degrade in the presence of generics.

You can download the code for the modified benchmark suite here. All credit to Vance for the code, I just copied and pasted the tests and modified them slightly to operate on type parameters. The x86 and x64 builds are selected via different build configurations. Release mode builds for Any CPU.

Edit: The results above target .NET 2.0, which largely shares the same runtime as .NET 3.0 and 3.5, and the results are the same all the way up to .NET 4.0. The .NET 4.0 VM allegedly underwent many changes, and after performing a few preliminary tests, I can say that the above numbers are all completely different. RuntimeTypeHandle is now the slowest test on x86, and x64 is way faster than x86 on pretty much all the tests. System.Type is stable and fastest even for generics. So the conclusion seems pretty obvious now: stick to System.Type and avoid RuntimeTypeHandle at all costs.

x64 - .NET 4.0

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000    8.525 +- 6%    msec
typeof(string).TypeHandle            : count: 500000   19.892 +- 4%    msec
anObj.GetType() == type              : count: 500000    4.792 +- 4%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   47.924 +- 3%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.563 +- 8%    msec
(anObj is string)                    : count: 500000    2.439 +- 8%    msec

typeof(T)                            : count: 500000   25.557 +- 2%    msec
typeof(T).TypeHandle                 : count: 500000   44.622 +- 3%    msec
anObj.GetType() == type              : count: 500000    4.310 +- 4%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   47.437 +- 1%    msec
anObj.GetType() == typeof(T)         : count: 500000   23.605 +- 4%    msec
(anObj is T)                         : count: 500000   39.682 +- 3%    msec

x86 - .NET 4.0

Data units of msec resolution = 0.329204 usec
typeof(string)                       : count: 500000   14.871 +- 3%    msec
typeof(string).TypeHandle            : count: 500000   36.175 +- 4%    msec
anObj.GetType() == type              : count: 500000   25.566 +- 1%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   49.741 +- 4%    msec
anObj.GetType() == typeof(string)    : count: 500000    2.620 +- 10%   msec
(anObj is string)                    : count: 500000    5.649 +- 9%    msec

typeof(T)                            : count: 500000   17.096 +- 11%   msec
typeof(T).TypeHandle                 : count: 500000   44.402 +- 2%    msec
anObj.GetType() == type              : count: 500000   25.086 +- 2%    msec
Type.GetTypeHandle(obj).Equals(tHnd) : count: 500000   48.827 +- 3%    msec
anObj.GetType() == typeof(T)         : count: 500000   41.598 +- 3%    msec
(anObj is T)                         : count: 500000   45.505 +- 1%    msec

As you can see from these updated results for .NET 4.0, the x64 VM is now comparable to the x86 VM, largely because the x86 VM appears to be slower than in .NET 2.0. The advantage of RuntimeTypeHandle in .NET 2.0 is completely gone, and it's now (surprisingly) the slowest means of comparing runtime types. Comparing instances of System.Type appears to be the fastest all around, and doesn't degrade if you're comparing generic type parameters.

No comments: