CouchDB document creation performance

CouchDB is a non-relational database which uses MapReduce inspired views to query data. There are lots of cool things to tell about its design, but I rather want to talk about its performance.

Today I’ve been busy hacking together a little script to import all e-mails of a long e-mail thread into a couchdb database to write views to extract all kinds of statistics. I already imported these e-mails into a MySQL database a few months ago, but was quite disappointed by the (performance) limitations of SQL. The e-mail thread contains over 20,000 messages which weren’t a real problem for MySQL. When importing, however, couchdb was adding them at a rate of only a few dozen per second with a lot of (seek)noise of my HDD.

So I decided to do a simple benchmark. First of, a simple script (ser.py) that adds empty documents sequentially. It’s averaging 16 per second. It occurred to me that couchdb waits for a fsync before sending a response and that asynchronously the performance would be way better. A simple modification to the script later (par.py) it still averaged 16 creations per second.

I guess, for I haven’t yet figured out how to let straces tell me, that it’s the fsync after each object creation which causes the mess. couchdb itself doesn’t write or seek a lot, but my journaling filesystem (XFS) does on a fsync.

Can anyone test it on a different filesystem?

Update Around 17/sec with reiserfs.

Update I had some trouble with the bulk update feature. I switched from svn to the 0.7.2 release. I got about 600/sec, which dropped to a steady-ish 350/sec when using sequential bulkupdates of 100 docs. Two bulk updates in parallel yield about 950/sec initially, dropping to 550/sec after a while. Three parallel updates yield similar performance.

Interrupting a select without a timeout

select is a POSIX syscall which allows you to wait on several different filedescriptors (including sockets) for the event that they won’t block on write; won’t block or read or are in error. This syscall is very convenient when you’re writing a server.

When I want to shutdown an instance of the server, I have to interrupt the select. I have yet to find a satisfying way of doing this. At the moment I create a pair of linked sockets with socketpair. I include one of them to the sockets on which to block until there is data to read in the select call. To interrupt, I simply write some data to the other socket which will cause data to be available on the socket which in turn will interrupt the select.

There must be a more elegant solution.

X61s (1)

I’ve been the lucky owner of a Thinkpad X61s for a bit more than a week now. It’s a light and small 12.1 inch notebook. It’s structurally very solid and has got almost the same full size keyboard as my 14.1inch T60. (The enter, tab, shift and alike keys are shortened in width).

The installation of Gentoo went quite smooth. For those interested the xorg.conf, make.conf and kernel .config I use.

Internal mic/speaker and external jacks; video; DRI and AIGLX; USB; PCMCIA; wireless; fingerprint reader and ethernet all seem to work just fine. I haven’t tested the firewire, ssd slot, bluetooth and n-capabilities of the wireless yet.

In comparison with my T60 the volume and the backlight buttons aren’t hardware controlled. Gnome recognizes the volume buttons, but not the backlight ones. I’m still working on those.