Skip to main content

Hash Array Mapped Trie Optimizations

It's been awhile since I posted about Sasa's new immutable hash-array mapped trie (HAMT), and I've only just gotten around to running some benchmarks and profiling the code. I just pushed a new project under the "Sasa/Bench" in the repo where I will place all Sasa benchmarks going forward.

Initially the benchmarks were disappointing, but 2 minutes with the profiler revealed the problem was the Tree.Add method, which was using a very simple but poor implementation. Basically, it was checking if the key was already in the tree before attempting to update, thus performing the traversal twice for every addition. I refactored this to share the same implementation as Tree.Update which performs only a single traversal, and the results are now more reasonable.

The benchmarks were run on an FX-8120 performing 200,000 individual inserts and 200,000 individual membership checks on a set of unique integers, ie. treating the dictionaries and trees as a set. The inserts were separately clocked from the membership tests.

Insertions into the HAMT appear to be about ~15x slower than insertions into the mutable dictionary when averaged over the bulk insert benchmark. There is a way to perform bulk inserts much more efficiently, but it wouldn't give a sense for incremental update costs.

Membership tests are ~2x slower for the HAMT, which seems like it's in the right ballpark for an initial implementation. The HAMT also uses a little less than twice the memory of the mutable collections according to the memory statistics after forcing a full GC.

According to profiling data, about 40% of the time in the insert benchmark is spent allocating new arrays, so there doesn't seem to be much room to improve the runtime of updates except perhaps by reducing allocations. I believe Clojure's persistent vectors implement some optimizations here, but I haven't had the need to dig into their implementation.

Lookup costs seem almost entirely related to virtual dispatch overhead while performing recursive lookups on sub-trees. About 45% of the time in the lookup benchmark is spent there. I'm not quite sure how to reduce this overhead, except perhaps to eliminate the class hierarchy that defines the tree structure and use faster type tests and casts. I'm not convinced it would make that much of a difference, but perhaps I'll give it a try if I'm bored some day.

If anyone has any suggestions or pointers to a simple explanation of Clojure's tricks or some other HAMT optimizations, please let me know!

Comments

Popular posts from this blog

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important. Why Reverse Mode Automatic Differentiation? As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary! Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type: R → R N That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario: R N → R That is, functions of many parameters t...

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre...

Blue-eyed Islander Puzzle - an analysis

Many people find themselves stumped by the so-called Blue-Eyed Islanders puzzle . There is also much controversy over its supposed solution. I'm going to analyze the problem and the solution, and in the process, explain why the solution works. To begin, let's modify the problem slightly and say that there's only 1 blue-eyed islander. When the foreigner makes his pronouncement, the blue-eyed islander looks around and sees no other blue eyes, and being logical, correctly deduces that his own eyes must be blue in order for the foreigner's statement to make sense. The lone blue-eyed islander thus commits suicide the following day at noon. Now comes the tricky part, and the source of much confusion. Let's say there are 2 blue-eyed islanders, Mort and Bob. When the foreigner makes his pronouncement, Mort and Bob look around and see only each other. Mort and Bob thus both temporarily assume that the other will commit suicide the following day at noon. Imagine their chagrin...