The Cuckoo Hashtable

A Hashtable algorithm is a specific algorithm to implement key-value pair datastructure with efficient by-key look-ups using hashing of the keys. A hashtable contains a list of buckets. In a simple implementation, the i-th bucket, contains the key-value pairs in a list of which the key has a hash that is i modulo the amount of buckets. The hash-table would increase the amount of buckets if any bucket contains more than a fixed amount of pairs. This results in a constant-time look-up, but an insertion might invoke a expensive rebuild.

There are a few variations on Hash Tables. I’d like to share a really smart one: the Cuckoo Hashtable.

The Cuckoo hashtable expects two different hash-functions for the keys. Instead of storing a list of pairs in each bucket, the Cuckoo Hash table stores a single pair in each bucket. When inserting a pair, it is inserted in one of the two possible locations. If it happens to be occupied, the old pair is replaced. Then the replaced pair is inserted at its other possible location, potentially kicking out another pair. This step is repeated, until there is no displaced pair or a loop is detected. When a loop is detected, the hash-functions can be changed, if the buckets are for the most part empty -or- when the table is almost full, the amount of buckets can be increased. It can be shown that an insertion has a amortized constant time. (The buckets can be resized in-place, and each entry is repositioned as if it was displaced by another.)

If I’d own a botnet… (1)

…and didn’t want to loose it if my control servers got shut down, I’d let every orphaned zombie randomly connect to hosts in a given IP range, and challenge them to give a preimage of a hardcoded hash. [ detail: add a salt to prevent replay attacks ]. With a sufficiently safe casu quo large range, it also might be helpfull to allow zombies to forward still orphan zombies.

Long mySQL keys

Instead of limiting your long key (for instance a path) to approximately 800 characters (on mySQL with generic UTF8), you can hash your key and store the hash as index.

The drawback is that you need to use a quite good hash function to avoid duplicates if you want to use it for a unique or primary key. These good hash functions tend to require some computing time.