Jan. 10th, 2017

fanf: (dotat)

Firstly, I have to say that it's totally awesome that I am writing this at all, and it's entirely due to the cool stuff done by people other than me. Yes! News about other people doing cool stuff with my half-baked ideas, how cool is that?

CZ.NIC Knot DNS

OK, DNS is approximately the ideal application for tries. It needs a data structure with key/value lookup and lexically ordered traversal.

When qp tries were new, I got some very positive feedback from Marek Vavrusa who I think was at CZ.NIC at the time. As well as being the Czech DNS registry, they also develop their own very competitive DNS server software. Clearly the potential for a win there, but I didn't have time to push a side project to production quality, nor any expectation that anyone else would do the work.

But, in November I got email from Vladimír Čunát telling me he had reimplemented qp tries to fix the portability bugs and missing features (such as prefix searches) in my qp trie code, and added it to Knot DNS. Knot was previously using a HAT trie.

Vladimír said qp tries could reduce total server RSS by more than 50% in a mass hosting test case. The disadvantage is that they are slightly slower than HAT tries, e.g. for the .com zone they do about twice as many memory indirections per lookup due to checking a nybble per node rather than a byte per node.

On balance, qp tries were a pretty good improvement. Thanks, Vladimír, for making such effective use of my ideas!

(I've written some notes on more memory-efficient DNS name lookups in qp tries in case anyone wants to help close the speed gap...)

Rust

Shortly before Christmas I spotted that Frank Denis has a qp trie implementation in Rust!

Sadly I'm still only appreciating Rust from a distance, but when I find some time to try it out properly, this will be top of my list of things to hack around with!

I think qp tries are an interesting test case for Rust, because at the core of the data structure is a tightly packed two word union with type tags tucked into the low order bits of a pointer. It is dirty low-level C, but in principle it ought to work nicely as a Rust enum, provided Rust can be persuaded to make the same layout optimizations. In my head a qp trie is a parametric recursive algebraic data type, and I wish there were a programming language with which I could express that clearly.

So, thanks, Frank, for giving me an extra incentive to try out Rust! Also, Frank's Twitter feed is ace, you should totally follow him.

Time vs space

Today I had a conversation on Twitter with @tef who has some really interesting ideas about possible improvements to qp tries.

One of the weaknesses of qp-tries, at least in my proof-of-concept implementation, is the allocator is called for every insert or delete. C's allocator is relatively heavyweight (compared to languages with tightly-coupled GCs) so it's not great to call it so frequently.

(Bagwell's HAMT paper was a major inspiration for qp tries, and he goes into some detail describing his custom allocator. It makes me feel like I'm slacking!)

There's an important trade-off between small memory size and keeping some spare space to avoid realloc() calls. I have erred on the side of optimizing for simple allocator calls and small data structure size at the cost of greater allocator stress.

@tef suggested adding extra space to each node for use as a write buffer, in a similar way to "fractal tree" indexes.. As well as avoiding calls to realloc(), a write buffer could avoid malloc() calls for inserting new nodes. I was totally nerd sniped by his cool ideas!

After some intensive thinking I worked out a sketch of how write buffers might amortize allocation in qp tries. I don't think it quite matches what tef had in mind, but it's definitely intriguing. It's very tempting to steal some time to turn the sketch into code, but I fear I need to focus more on things that are directly helpful to my colleagues...

Anyway, thanks, tef, for the inspiring conversation! It also, tangentially, led me to write this item for my blog.

fanf: (dotat)

Here's an amusing trick that I was just discussing with Mark Wooding on IRC.

Did you know you can define functions with optional and/or named arguments in C99? It's not even completely horrible!

The main limitation is there must be at least one non-optional argument, and you need to compile with -std=c99 -Wno-override-init.

(I originally wrote that this code needs C11, but Miod Vallat pointed out it works fine in C99)

The pattern works like this, using a function called repeat() as an example.

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    // Define the argument list as a structure. The dummy argument
    // at the start allows you to call the function with either
    // positional or named arguments.

    struct repeat_args {
        void *dummy;
        char *str;
        int n;
        char *sep;
    };

    // Define a wrapper macro that sets the default values and
    // hides away the rubric.

    #define repeat(...) \
            repeat((struct repeat_args){ \
                    .n = 1, .sep = " ", \
                    .dummy = NULL, __VA_ARGS__ })

    // Finally, define the function,
    // but remember to suppress macro expansion!

    char *(repeat)(struct repeat_args a) {
        if(a.n < 1)
            return(NULL);
        char *r = malloc((a.n - 1) * strlen(a.sep) +
                         a.n * strlen(a.str) + 1);
        if(r == NULL)
            return(NULL);
        strcpy(r, a.str);
        while(a.n-- > 1) { // accidentally quadratic
            strcat(r, a.sep);
            strcat(r, a.str);
        }
        return(r);
    }

    int main(void) {

        // Invoke it like this
        printf("%s\n", repeat(.str = "ho", .n = 3));

        // Or equivalently
        printf("%s\n", repeat("ho", 3, " "));

    }
Page generated Jun. 28th, 2017 07:13 pm
Powered by Dreamwidth Studios