Skip to main content

Hash Array Mapped Trie Optimizations

It's been awhile since I posted about Sasa's new immutable hash-array mapped trie (HAMT), and I've only just gotten around to running some benchmarks and profiling the code. I just pushed a new project under the "Sasa/Bench" in the repo where I will place all Sasa benchmarks going forward.

Initially the benchmarks were disappointing, but 2 minutes with the profiler revealed the problem was the Tree.Add method, which was using a very simple but poor implementation. Basically, it was checking if the key was already in the tree before attempting to update, thus performing the traversal twice for every addition. I refactored this to share the same implementation as Tree.Update which performs only a single traversal, and the results are now more reasonable.

The benchmarks were run on an FX-8120 performing 200,000 individual inserts and 200,000 individual membership checks on a set of unique integers, ie. treating the dictionaries and trees as a set. The inserts were separately clocked from the membership tests.

Insertions into the HAMT appear to be about ~15x slower than insertions into the mutable dictionary when averaged over the bulk insert benchmark. There is a way to perform bulk inserts much more efficiently, but it wouldn't give a sense for incremental update costs.

Membership tests are ~2x slower for the HAMT, which seems like it's in the right ballpark for an initial implementation. The HAMT also uses a little less than twice the memory of the mutable collections according to the memory statistics after forcing a full GC.

According to profiling data, about 40% of the time in the insert benchmark is spent allocating new arrays, so there doesn't seem to be much room to improve the runtime of updates except perhaps by reducing allocations. I believe Clojure's persistent vectors implement some optimizations here, but I haven't had the need to dig into their implementation.

Lookup costs seem almost entirely related to virtual dispatch overhead while performing recursive lookups on sub-trees. About 45% of the time in the lookup benchmark is spent there. I'm not quite sure how to reduce this overhead, except perhaps to eliminate the class hierarchy that defines the tree structure and use faster type tests and casts. I'm not convinced it would make that much of a difference, but perhaps I'll give it a try if I'm bored some day.

If anyone has any suggestions or pointers to a simple explanation of Clojure's tricks or some other HAMT optimizations, please let me know!

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

Simple, Extensible IoC in C#

I just committed the core of a simple dependency injection container to a standalone assembly, Sasa.IoC . The interface is pretty straightforward: public static class Dependency { // static, type-indexed operations public static T Resolve<T>(); public static void Register<T>(Func<T> create) public static void Register<TInterface, TRegistrant>() where TRegistrant : TInterface, new() // dynamic, runtime type operations public static object Resolve(Type registrant); public static void Register(Type publicInterface, Type registrant, params Type[] dependencies) } If you were ever curious about IoC, the Dependency class is only about 100 lines of code. You can even skip the dynamic operations and it's only ~50 lines of code. The dynamic operations then just use reflection to invoke the typed operations. Dependency uses static generic fields, so resolution is pretty much just a field access + invoking a

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen