Skip to main content

Dispatching: VTables vs. Runtime Tests and Casts

Object oriented languages like C# and Java use method dispatching techniques to select a concrete method to invoke when given a polymorphic type. By far the most common technique is the virtual method table, or vtable for short.

Long ago I had updated the above Wikipedia page with information regarding alternative dispatch techniques. Research published a few years back indicated that dispatching through a vtable incurred astonishing overheads, as high as 50% of total execution time. Alternative dispatch techniques based on runtime tests, such as a linear sequence of if statements checking for the various concrete class types, or a sequence of nested if statements forming a binary search, were often more efficient on a variety of hardware.

Vtables are rather flexible however, and the composition of a number of vtables can encode a variety of advanced dispatching behaviour. The .NET CLR utilizes vtables, and I've sketched out rough encodings of first-class functions, algebraic data types and pattern matching, higher-ranked polymorphism, and even a form of first-class messages, all by encoding them as virtual method dispatches.

Unfortunately, research indicated that vtable dispatch is quite expensive, and many of these idioms require at least one and often two virtual dispatches. Runtime tests and casts might be usable here instead, but a call site compiled for a fixed hierarchy of classes cannot be extended as easily as vtable-based dispatching. The only exception is a JIT using a technique called "polymorphic inline caches", which neither the CLR nor JVM utilize.

So I set about to measure the dispatch overhead on the CLR. Here's the source for my test, which consists of 3 classes in a hierarchy, each of which provide a visitor for double-dispatching, and a single method for the single virtual dispatch test. The driver code performs the runtime tests and casts at the dispatch site for the last case.

The numbers are total CPU ticks for the entire test, so lower numbers are better. It's a purely CPU-bound test, and I ran it on a Core 2 Duo 2.13GHz. I used .NET's high resolution timer, and I threw away the highest and lowest numbers:

Double DispatchRuntime TestSingle Dispatch

As you can see, runtime tests and casts are surprisingly efficient, beating out even single dispatch. Any functional language targeting the CLR will definitely want to use runtime tests to implement pattern matching. I might try a deeper hierarchy at some point, but research has shown that up to 95% of algebraic types have 3 cases or less.

Unfortunately, when the complete set of concrete types is not known, a vtable dispatch is unavoidable, unless one resorts to runtime code generation to recompile the dispatch site. This is essentially a polymorphic inline cache, and it requires more sophisticated runtime infrastructure than the CLR natively supports, though it is possible if the language abstracts program execution away from how the CLR natively works.

So in summary, vtables are nice and simple, but rather inefficient on current hardware.


jam40jeff said…
This comment has been removed by the author.

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen

Simple, Extensible IoC in C#

I just committed the core of a simple dependency injection container to a standalone assembly, Sasa.IoC . The interface is pretty straightforward: public static class Dependency { // static, type-indexed operations public static T Resolve<T>(); public static void Register<T>(Func<T> create) public static void Register<TInterface, TRegistrant>() where TRegistrant : TInterface, new() // dynamic, runtime type operations public static object Resolve(Type registrant); public static void Register(Type publicInterface, Type registrant, params Type[] dependencies) } If you were ever curious about IoC, the Dependency class is only about 100 lines of code. You can even skip the dynamic operations and it's only ~50 lines of code. The dynamic operations then just use reflection to invoke the typed operations. Dependency uses static generic fields, so resolution is pretty much just a field access + invoking a