Rethinking Sanakirja, a Rust database engine with fast clones

CommunityNews · 5 February 2021 16:48

My last post about Sanakirja sparked a few really constructive discussions, and made me realise that people still cared about the problem of on-disk key-value stores, as unfancy as that problem may sound. This post looks back on some design mistakes I’ve made when I wrote it, and includes benchmarks showing it’s now faster than the fastest equivalent C library.

Why?

A long time ago, Pijul was using LMDB as its backend, with a number of fundamental limitations, including:

Being restricted to two datatypes: an array of B+ trees where keys are byte strings and values are either (1) byte strings or (2) B+ trees where keys are bytestrings and values are zero-sized. In Rust terminology, this is equivalent to roughly 500 tables, where a table is either BTreeMap<&[u8], &[u8]> or BTreeMap<&[u8], BTreeMap<&[u8], ()>>.

Being written in C, meaning that it is potentially fast, but hard to extend in any nontrivial way. About the “fast” part, my benchmarks show that indeed, it is quite fast — just not as fast as a carefully-designed Rust version.

More importantly, I needed to fork tables efficiently, without copying anything. This was especially important back then, when Pijul tables for small repositories often weighed dozens of megabytes. It may be slightly less relevant now, but now that it’s there, there is no reason not to use it.This is implemented with an extra table storing reference counts of each page that is referenced at least twice (in order to avoid infinite recursions, the table itself isn’t clonable, and therefore all its pages are referenced once).

This thread was posted by one of our members via one of our news source trackers.