It's been awhile since I posted about Sasa's new immutable hash-array mapped trie (HAMT), and I've only just gotten around to running some benchmarks and profiling the code. I just pushed a new project under the "Sasa/Bench" in the repo where I will place all Sasa benchmarks going forward.
Initially the benchmarks were disappointing, but 2 minutes with the profiler revealed the problem was the Tree.Add method, which was using a very simple but poor implementation. Basically, it was checking if the key was already in the tree before attempting to update, thus performing the traversal twice for every addition. I refactored this to share the same implementation as Tree.Update which performs only a single traversal, and the results are now more reasonable.
The benchmarks were run on an FX-8120 performing 200,000 individual inserts and 200,000 individual membership checks on a set of unique integers, ie. treating the dictionaries and trees as a set. The inserts were separately clocked from the membership tests.
Insertions into the HAMT appear to be about ~15x slower than insertions into the mutable dictionary when averaged over the bulk insert benchmark. There is a way to perform bulk inserts much more efficiently, but it wouldn't give a sense for incremental update costs.
Membership tests are ~2x slower for the HAMT, which seems like it's in the right ballpark for an initial implementation. The HAMT also uses a little less than twice the memory of the mutable collections according to the memory statistics after forcing a full GC.
According to profiling data, about 40% of the time in the insert benchmark is spent allocating new arrays, so there doesn't seem to be much room to improve the runtime of updates except perhaps by reducing allocations. I believe Clojure's persistent vectors implement some optimizations here, but I haven't had the need to dig into their implementation.
Lookup costs seem almost entirely related to virtual dispatch overhead while performing recursive lookups on sub-trees. About 45% of the time in the lookup benchmark is spent there. I'm not quite sure how to reduce this overhead, except perhaps to eliminate the class hierarchy that defines the tree structure and use faster type tests and casts. I'm not convinced it would make that much of a difference, but perhaps I'll give it a try if I'm bored some day.
If anyone has any suggestions or pointers to a simple explanation of Clojure's tricks or some other HAMT optimizations, please let me know!
Comments