Monday, October 20, 2008

Dispatching: VTables vs. Runtime Tests and Casts

Object oriented languages like C# and Java use method dispatching techniques to select a concrete method to invoke when given a polymorphic type. By far the most common technique is the virtual method table, or vtable for short.

Long ago I had updated the above Wikipedia page with information regarding alternative dispatch techniques. Research published a few years back indicated that dispatching through a vtable incurred astonishing overheads, as high as 50% of total execution time. Alternative dispatch techniques based on runtime tests, such as a linear sequence of if statements checking for the various concrete class types, or a sequence of nested if statements forming a binary search, were often more efficient on a variety of hardware.

Vtables are rather flexible however, and the composition of a number of vtables can encode a variety of advanced dispatching behaviour. The .NET CLR utilizes vtables, and I've sketched out rough encodings of first-class functions, algebraic data types and pattern matching, higher-ranked polymorphism, and even a form of first-class messages, all by encoding them as virtual method dispatches.

Unfortunately, research indicated that vtable dispatch is quite expensive, and many of these idioms require at least one and often two virtual dispatches. Runtime tests and casts might be usable here instead, but a call site compiled for a fixed hierarchy of classes cannot be extended as easily as vtable-based dispatching. The only exception is a JIT using a technique called "polymorphic inline caches", which neither the CLR nor JVM utilize.

So I set about to measure the dispatch overhead on the CLR. Here's the source for my test, which consists of 3 classes in a hierarchy, each of which provide a visitor for double-dispatching, and a single method for the single virtual dispatch test. The driver code performs the runtime tests and casts at the dispatch site for the last case.

The numbers are total CPU ticks for the entire test, so lower numbers are better. It's a purely CPU-bound test, and I ran it on a Core 2 Duo 2.13GHz. I used .NET's high resolution timer, and I threw away the highest and lowest numbers:

Double DispatchRuntime TestSingle Dispatch
165822287210924693281136358792
165947496010929173681136520160
165948547210936387521136745536
165986648010945465841136985448
166028776810946390881137321136
166048831210946890401137526400
166066264810948562161137994576
166211008810955032561138300032
166624829610957348481138461264
166737745610959895281138609112
167212774410971268801141472360
Avg:
1662395645.091094737353.451137844983.27

As you can see, runtime tests and casts are surprisingly efficient, beating out even single dispatch. Any functional language targeting the CLR will definitely want to use runtime tests to implement pattern matching. I might try a deeper hierarchy at some point, but research has shown that up to 95% of algebraic types have 3 cases or less.

Unfortunately, when the complete set of concrete types is not known, a vtable dispatch is unavoidable, unless one resorts to runtime code generation to recompile the dispatch site. This is essentially a polymorphic inline cache, and it requires more sophisticated runtime infrastructure than the CLR natively supports, though it is possible if the language abstracts program execution away from how the CLR natively works.

So in summary, vtables are nice and simple, but rather inefficient on current hardware.

No comments: