Dispatching: VTables vs. Runtime Tests and Casts

Object oriented languages like C# and Java use method dispatching techniques to select a concrete method to invoke when given a polymorphic type. By far the most common technique is the virtual method table, or vtable for short.

Long ago I had updated the above Wikipedia page with information regarding alternative dispatch techniques. Research published a few years back indicated that dispatching through a vtable incurred astonishing overheads, as high as 50% of total execution time. Alternative dispatch techniques based on runtime tests, such as a linear sequence of if statements checking for the various concrete class types, or a sequence of nested if statements forming a binary search, were often more efficient on a variety of hardware.

Vtables are rather flexible however, and the composition of a number of vtables can encode a variety of advanced dispatching behaviour. The .NET CLR utilizes vtables, and I've sketched out rough encodings of first-class functions, algebraic data types and pattern matching, higher-ranked polymorphism, and even a form of first-class messages, all by encoding them as virtual method dispatches.

Unfortunately, research indicated that vtable dispatch is quite expensive, and many of these idioms require at least one and often two virtual dispatches. Runtime tests and casts might be usable here instead, but a call site compiled for a fixed hierarchy of classes cannot be extended as easily as vtable-based dispatching. The only exception is a JIT using a technique called "polymorphic inline caches", which neither the CLR nor JVM utilize.

So I set about to measure the dispatch overhead on the CLR. Here's the source for my test, which consists of 3 classes in a hierarchy, each of which provide a visitor for double-dispatching, and a single method for the single virtual dispatch test. The driver code performs the runtime tests and casts at the dispatch site for the last case.

The numbers are total CPU ticks for the entire test, so lower numbers are better. It's a purely CPU-bound test, and I ran it on a Core 2 Duo 2.13GHz. I used .NET's high resolution timer, and I threw away the highest and lowest numbers:

Double Dispatch	Runtime Test	Single Dispatch
1658222872	1092469328	1136358792
1659474960	1092917368	1136520160
1659485472	1093638752	1136745536
1659866480	1094546584	1136985448
1660287768	1094639088	1137321136
1660488312	1094689040	1137526400
1660662648	1094856216	1137994576
1662110088	1095503256	1138300032
1666248296	1095734848	1138461264
1667377456	1095989528	1138609112
1672127744	1097126880	1141472360
Avg:
1662395645.09	1094737353.45	1137844983.27

As you can see, runtime tests and casts are surprisingly efficient, beating out even single dispatch. Any functional language targeting the CLR will definitely want to use runtime tests to implement pattern matching. I might try a deeper hierarchy at some point, but research has shown that up to 95% of algebraic types have 3 cases or less.

Unfortunately, when the complete set of concrete types is not known, a vtable dispatch is unavoidable, unless one resorts to runtime code generation to recompile the dispatch site. This is essentially a polymorphic inline cache, and it requires more sophisticated runtime infrastructure than the CLR natively supports, though it is possible if the language abstracts program execution away from how the CLR natively works.

So in summary, vtables are nice and simple, but rather inefficient on current hardware.

Comments

jam40jeff said…

This comment has been removed by the author.

April 1, 2015 at 10:17 AM

Higher Logics

Search This Blog

Dispatching: VTables vs. Runtime Tests and Casts

Labels

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

Easy Automatic Differentiation in C#

Building a Query DSL in C#