Thread synchronization mechanisms (Locks)

To share it with you and making sure I won’t forget, I present you with (some of) the different thread synchronization mechanisms known to mankind. Most of the mechanisms are used to lock operations or data, some aren’t. You know, those nifty little things that ought to help you with thread hell.

Semaphore
The semaphore is the cournerstone of locking – often other locks are implemented using one or more semaphores. A semaphore is a simple integer. The initial value of this integer is the amount of threads that can attain (aka. lock, down, enter, pass …) the semaphore before it will block incoming threads – a blocked thread will have to wait until a thread that has attained the semaphore releases (aka. unlocks, ups, leaves …) the semaphore.

The implementation of a semaphore is an atomic integer on most systems. When a thread wants to attain the semaphore it will check whether the value is greater than 0. If it is it will decrement the value and the semaphore has been attained. If the value is 0 it means that there are already as much threads that have attained the lock as the semaphore allows – and the thread will wait a while and try again. If a thread releases the semaphore it will increment the value.

You can compare a semaphore with a dining table in a mess hall at which only a certain amount of people can dine. People will either have to wait for someone to finish and stand up, or they’ll have to look for some other place.

Mutex
The mutual exclusion lock is the lock you’ll be most familiar with and is often just referred to as a lock, because of its popularity. A mutex is either locked or unlocked. Only one thread can hold the lock – all others will have to wait for that other thread to release the mutex. The mutex is often used to assure mutual exclusive access to a certain operation or piece of data.

A mutex is simply just implemented as a semphore with a starting value of 1. The first thread that would attain the mutex would decrement the value to 0 and block all other incoming threads – until it releases the lock.

You can compare a mutex with the door lock on your restroom, it will help you make sure that you got some privacy.

Event
An event will block all incoming threads, until it is signalled to release a thread or all of them at the same time. Events are quite useful when dealing with situations when you want to queue threads before they are allowed to pass. Events aren’t locks that you lock and after that have to unlock. A thread just waits for an event, and doesn’t have to ‘unlock’ it afterwards.

An event is usually also implemented using a semaphore. This time the default value of the semaphore would be 0. If a thread wants to pass the event it’ll try to decrement the semaphore`s value, unless it is 0 in which case it’ll wait, just as you would normally lock a semaphore. If the event is signalled to release a thread it will increment the semaphore`s value – one of the waiting threads will notice that the value isn’t 0 anymore, decrement it to 0 again and pass. If the event is signalled to release all waiting threads it’ll keep on incrementing the value until all threads have passed. Usually there is a mutex alongside the semaphore to make sure that a thread can’t slip in, while all threads are being released, and be released to soon.

You can compare an event with a (chaotic) queue in front of a fun park attraction. People have to wait until the operator of the attraction tells one of them to continue, or tells all of them to continue at once. (the attraction could be nearly empty with enough room for all of them).

Gate
A gate is a flavour of an event. It blocks incoming threads just like an event, and lets all of them through if signalled. The difference it that it won’t close again – once signalled (‘opened’) it’ll allow all threads through until it is explicitly closed. It also can’t let threads through one by one.

A gate is usually implemented using a semaphore, with the initial value of 0 so it will block incoming threads. Each incoming thread tries to attain the semaphore as is done too with the event – but when a semaphore lock is attained it’ll directly release the semaphore so it’ll be available for another thread waiting to pass too.

You can compare a gate with a … gate. Or a draw bridge if you like. If it’s closed none can pass. If it’s opened everyone can pass. Very straightforward.

Barrier
A barrier blocks all incoming thread until there are a certain number of threads. In that case the barrier will release all those threads, but will remain to block incoming threads (until they have reached the specified count together too, etc). A barrier is useful to make sure that a set of threads all start something at the same time and maybe even more important aren’t doing what they did before.

A barrier is usually implemented as a custom event, that will have a waiter-count-field (and a mutex to synchronize access to it), which will auto release all threads when the waiter count has reached the specified number.

You can compare a barrier with watter drops from a tap. The drop won’t fall until there is a specific amount of water that causes it to fall.

So far for the thread synchronization mechanisms today. Tomorrow I’ll maybe talk about the hassles of implementing them yourself.

Some credit goes to Kaja, who told me about all these locks in the first place :-).

Upgraded to WordPress 2.0.1

Just upgraded to WordPress 2.0.1, which was way too easy:

# in my blog.w-nz.com htdocs folder
wget http://wordpress.org/latest.tar.gz
tar -xvzpf latest.tar.gz
cp wordpress/* . -R

Visit the upgrade script, which consisted out of one simple click and I was done.

Great stuff.

PS. I reuploaded my logo and google analytics code too, but they don’t count, really (even though they consisted out of more work that the rest of the upgrade).

Gentoo linux 2006.0 installer

Today Gentoo released a live-cd with a gentoo-installer on it. They also posted some nice screenshots.

Gentoo Linux is a linux redistribution that features its home-made package manager Portage. This package manager makes Gentoo special. There are a lot of other package managers, but Portage is one of the only package managers that compiles each package on your computer, instead of installing precompiled binary’s.

Compiling everything yourself has got the advantage that you can tweak your package to your specific needs. This means you can have a nice customized system. But the key advantage is performance. Most binaries that package managers install are compiled for generic x86. That means that it can even run on your old 486. The issue is that a lot has changed since the 486. A few new instruction sets (MMX, SSE, etc), specifically designed to increase performance for common tasks, are present in almost every processor which aren’t used by generic x86 compiled binaries. And even more performance advantages1.

For this reason some people already did this (see Linux From Scratch). The problem is that it takes a lot of time to configure, build, install and trouble shoot at least 200 packages. If you take into account that your average server and desktop have at least 15 updated packages per week you are looking at a huge amount of dedication and time required to keep your own Linux From Scratch up to date.

This is where Gentoo and Portage comes in. If you want apache, you simply type emerge apache. If you want to update everything, you simply type emerge -u world. Portage checks which packages are required, it downloads them, it patches them, compiles them and installs them for you.

But wait? Maybe I don’t want ssl build into apache. How do I do that? Fairly simple actually. Portage has got a system called use-flags. There is a useflag called ssl. You can put ~ssl in the USE entry in /etc/make.conf to tell to portage that you don’t want ssl on any package. Or you could put net-www/apache ~ssl in /etc/portage/package.use, which tells that you want ssl to be disabled for apache.

Off course Portage isn’t perfect, and you will have some trouble once in a while, but it’s better than the headaches caused by LFS and the reward is similar2.

— A happy Gentoo user.

1When compiling for generic x86 a lot of memory is aligned which causes a lot of overhead for processors that don’t really care a lot about alignment. Not only are there these nice new instruction sets, but each processor has got specific differences in the implementation of the instruction. On some processors instruction X may be faster than instruction Y. Also some processor information in tight loops can be useful. On one processor the loop may fit in L1-cache, on others it wouldn’t.

2You can offcourse do more with LFS. But usually it isn’t that important, and if it were it would be better to add it into Portage yourself than to switch to LFS for that reason.

Xgl

After a long night wrestling with alfa source code, I’ve managed to install Xgl.

Xgl ownageXgl transparency

Xgl is a Xorg-X11 layer that uses openGL to achieve some nice stunning effects.

One of these is to be able to switch desktops by pressing ‘Ctrl+Alt’ and dragging your desktop-cube.

There are a lot of other things that I can’t show with screenshots. Take a look at the Xgl release post. These things include that all forms behave flexible. If I drag a form it’s like it’s made of rubber instead of concrete. Also every form pops up gently animated. There’s also a mac osX expose-clone, which is really helpfull.

If you want to install it yourself then the gentoo wiki article and hanno’s blog post should be very helpful.

Virtualization isn’t hardware efficient

Virtualization is the thing right now. Escpacially because VMWare released VMServer for free.

I’ve read quite a lot of posts about the pro’s about virtualization, and I`m annoyed by some.

This for some claim that virtualization is hardware efficient. Quite frankly, it isn’t. It is quite inefficient.

The thing though is that you don’t use your server optimally usually, and that with virtualization you can more easily get most out of your server, although you could get more from your server if you wouldn’t use virtualization. But that is a lot trickier, virtualization is quite flexible.

Virtualization doesn’t make hardware more efficient, it makes you more efficient at allocating resources of your server.

This server, w-nz.com, runs on a virtual server. I can’t affort or would fully use a ful dedicated server, but I can affort and use a virtual server. Efficient. But running all sites on the real server on just one server would be more efficient, but harder to do, espacially related to security.

Virtual locality of reference

Quite a lot of programmers consider the advantages of locality of reference; if they put all the data they work with in about the same space the processor will be able to retreive them more easily, for the processor caches regions of memory in the Lx caches.

Quite a lot of memory managers even move objects for you in the same place which seem to depend on each other. This doesn’t only decrease fragmentation, for they would be freed in the same timespan, but also increase performance due to locality of reference.

Although the advantage of locality of reference isn’t as big as most people presume in some situations due to virtual memory.

The address space a process sees is a virtual address space, it is mapped in place by the OS. This means that 0x1234 can contain an image for one process and a normal integer for another process. This enables two processes to be loaded, virtually, in the same place.

One implication this has on locality of reference is that a seamingly contiguous piece of memory can in physically be distributed over quite a big span of real physical memory. One part of your list could be at 0x1300, the other at 0x30240.

If your program wants to enumerate through that list it`ll have to query the RAM for the second part, which it wouldn’t have to if it would be contiguous in the physical memory, for then it would most likely be in complete in the L2 cache.

The physical memory is mapped page by page into the virtual memory space. Usually pages are 2KB. Each page is always contiguous in physical memory for it can’t be split up by mapping. Therefore locality of reference always works within a page.

It doesn’t mean that if you are working with more than a page of memory it will be all fragmented. The OS usually does a good job at keeping pages contiguous in physical memory when it is contiguous in virtual memory.

Although when a computer is under some load and it’ll have to use each bit of memory and swap in and out memory it will likely be a lot more fragmented.

I don’t want people to be ignorant about the advantage of locality of reference when dealing with more than a page of memory. I want people that trade of literally everything for their holy locality of reference to see that it’s just a nice extra, but that it just doesn’t always work due to fragmentation of the physical memory.

Side notes
Virtual memory is a bit more complicated than I explained, read about it on wikipedia.
Processors will try to keep virtual memory address space in mind to avoid this issue, although it’s limited.

Mulholland Drive

Mulholland Drive is a great film. If you haven’t watched it yet, just watch it.

Those lucky enough to be astounded and puzzled by the film have all came up with their own interpretation of the movie.

The most common interpretation is that either the first or the last part is a dream of the opposite part. I’d have to disagree with that for both parts are just too bizarre to be real. However, both parts are pseudo-opposite.

Just take the ‘119’ on the firetruck for instance that hasn’t been mirrored, but still reversed.

It keeps you thinking in any case.

PHP’s hidden treasures

I’ve complained a lot when I worked with PHP about PHP’s terribly inefficient design; I’ve complained about it, just because it was PHP. The things I missed most in PHP, it seems, were actually there all along!

Behold shared memory and yet more IPC.

One reason PHP really sucked was that you need to load small data from and to databases or files if you want to share it between page views. The whole concept of handling a request per page view is ridiculous too, IMO.

With those two libraries however, PHP pages could be way more efficient.

I wonder why the big PHP software haven’t used it. Lets hope it isn’t portability.

Beacon with eggs and …

Spam, again on this blog.

It seems those nasty spammers are now using actual people (or an automated browser) to post spam comments, subverting the protection wp-hashcash delivers.

Luckily it are only about two per day, which is very managable, but still annoying.

Like DRM and most other copyright protections, SPAM protection is inherently insecure, for the original openess required to allow a not-spammer to use them is, well, too open.

One interesting thing is that we can fight back now that they are using full javascript VM’s. Matching the spam-ip and letting it execute a rather ‘memory inefficient’ little program would certainly make me smile.

mail.w-nz.com

I’ve successfully installed vpopmail, qmail and courier today, after some hours of work on this server.
From now on you can send e-mail again to @w-nz.com adresses, which won’t end up in /dev/null. (like bas.westerbaan@~)

To the users of the server: if you want an email-address, mail me.

While testing my smtp server whether it would receive incoming email properly I was harshly confirmed by it being spam instead of my test mail entering my mailbox first:

Hello Spam!
It’s pretty funny to see spam junk in an ancient text-mode mailing program.

If anyone experiences problems with any application not mailing you any verification as it should, please email me. It could very possibly be that the qmail-send isn’t configured properly to allow unauthorized local mails.

Update: More good news, the amount of RAM on my virtual server has increased to 120MB! Which means that my server won’t be swapping instead of working all the time anymore.

Welcome to lighttpd

After a long afternoon I’ve got lighttpd to work with my current apache based layout.

This means I can choose whether to run apache or lighttpd.

Lighttpd is a webserver, like apache. The key advantage of lighttpd over apache is that lighttpd is very light on your server. It uses a lot less memory, which is very nice espacially when considering that my server only has got ~90mB of memory.

The drawback of lighttpd is that it is light and doesn’t support as much as apache does.

It doesn’t do .htaccess files. Everything needs to be configured in the lighttpd.conf, which doesn’t support everything, or at least not in the same way as apache does.

However, lighttpd is pretty easy to configure when you get the hang of it.

One particulair pain in the ass is getting old mod_rewrite using .htaccess to work, for instance the one used by wordpress of this blog.

I’ve added this in my lighttpd.conf:

$HTTP["host"] =~ "blog.w-nz.com" {
url.rewrite = ( "^/(page|archives|comments|search|feed)/" => "/index.php?error=404" )
}

One interesting thing to note is that the configuration file is nothing more than a script being executed for each request.

Rainbow Tables: Coverage

A rainbow table is generated by creating (m) chains using randomly picked starting keys. The reduction functions result (or ought to result at least) in evenly distributed new keys. Their is only a probability that a certain key/cipher pair is in the table.

I’ll gradually explain how the coverage can be calculated.

Take m chains, each 1 long to cover a keyspace of N keys. The chance that you can find the key is:

[tex]\displaystyle P = \frac{m}{N}[/tex]

When we have 1/2N chains, we’ll subsequently have a P of 1/2, a 50% coverage. Obvious.

When each chain would be 2 in length it gets trickier.

[tex]P = \frac{m}{N}[/tex] still is the chance that we can find the key in the first column (first item of each chain of the table) of the table. The second column isn’t as straight forward. We’ve blindly assumed that each key in the first column is unique, which is true for the first column, but certainly isn’t for the second column. A few chains can merge. The second column is ‘random’.

The chance that a certain key is in a specific chain of the second column is [tex]\frac{1}{N}[/tex]. The chance that a certain key is in the second column is [tex]1 – \left(1 – \frac{1}{N} \right)^m[/tex] (One minus the chance that there isn’t a certain key in a certain chain multiplied by itself the-amount-of-chains times).

The amount of keys covered by this column is the chance that a certain key is in the column times the amount of keys: [tex]N \left(1 – \left(1 – \frac{1}{N} \right)^m \right)[/tex].

The chance a certain key is in any of the two columns would be:

[tex]\left( \frac{m}{N} \right) \cdot N \left(1 – \left(1 – \frac{1}{N} \right)^m \right) \over N[/tex]

The third column’s coverage can be calculated by taking the unique keys in the second column as m was taken for the amount of unique keys for the first column. With each chain t in length this formula applies:

[tex]\displaystyle m_1=m \\ m_{n+1}=N \left( 1 – \left( 1 – \frac {1}{N} \right) \right) \\ P=1-\prod^m_{i=1}1-\frac{m_i}{N}[/tex]

Rainbow Tables: Introduction

A rainbow table is a data structure with which it is possible to retreive the key used to generate a ciphertext with a fixed plaintext. One common application is to reverse (password) hashes; which can be seen as cipher functions who take the message as the key and the ciphertext as the generated hash.

One important thing to note is that all keys and ciphertext pairs stored in the table have been precomputed during generation of the table. Generating a table that stores 1 miljon md5 password hashes, requires all those passwords to be hashed during precompution.

I’ve been doing some research about rainbow tables and I think it will be usefull to share some of my experiences.

But before I can dive into the matter I first got to explain how rainbow tables work. (or you can read the ‘official’ paper about rainbow tables by Philippe Oechslin).

Precomputing a significant amount of key/ciphertext (password/password hash) pairs is fairly easy. The issue is that a significant amount is fairly large, 237 for alphanumerical windows password hashes to give a real-life example. Because it isn’t practical to store 237 pairs of key/ciphertext’s (~2TB) on your computer; rainbow tables store chains of key/ciphertext’s pairs, more precisely they store only the starting and ending key of a chain.

Each chain starts with a key (randomly generated; in our example a password). The next item in the chain is the ciphertext which is calculated by the cipher (in our example the hash for the password). The next item is a key (password) again, which was generated from the ciphertext (hash) by a reduction function. This reduction function can be almost everything, it can be a XOR hash, it can be CRC32, as long as it changes the (larger) cipher text into a new key.

k1 — cipher –> C1 — reduction function 1–> k2 — cipher –> C2 — reduction function 2 –>…

Each chain has got a fixed length (t). In the table only the first and the last key are stored (k1 and kt).

To look up a certain ciphertext, the reduction function is applied on the cipher text and all the end-keys of all chains are checked whether they match. If one of them matches you can regenerate the chain from the begin key stored allong the end key of the chain, and you’ll find the key that when applied the reduction and the cipher on will result in your ciphertext (which is the last ciphertext in the chain). When no end key matches, the cipher and reduction functions are applied to generate a new key, and all the end key’s are checked again. If one of the end key’s matches you know that your ciphertext is in the chain in the second-last position. Repeat this t times for each position in the chain.

It sounds straight forward, but there are some issues that appear:

  • Loops, there are some cases where the reduction function generates a key which when being applied the cipher function will become the first plain text. (K = R(S(K)), where R is the reduction function, and S is the cipher). This will cause a the rest of the chain to be useless.
  • Merging chains/False alarms, because the ciphertext is larger than the key there will likely be a lot of chains with the same end point, this reduces the total coverage of the table and will cause a lot of additional time spent on the ‘false alarms’ caused by the fact that you have to check each matching endpoint, even if the starting key is the wrong branch.

To avoid loops; avoid merging chains a bit more and to increase lookup speed, rainbow tables use rainbow chains: each column in the table uses a different reduction function.

Because we use a different reduction function for each ciphertext in the chain there can’t be loops, or at least not loops that cause redundant data, because a certain ciphertext will become a different key in a different column of the chain.

The amount of merging chains is also cut by t times, because a certain key in a chain will only lead to the same chain when it is in the same place of the chain.

This is a very superficial explanation, read more (when you’re interested) in the original paper.

The problem with rainbow tables

Rainbow tables (Making a Faster Cryptanalytic Time-Memory Trade-Off) work fairly well, although – because they use different regression functions in the chains – they contain a lot of redundant data.

The example table in the paper (lanmanager hash; 8 char. alpha; 4666×38223872) contains a lot of redundant data, which can be calculated:

>>> CalcRedundancy(4666, 38223872, CalcN(26, 1, 7))
21.50866258033486

The 300mb table would be stored in a 20mb table if the regression function was to be perfect.

The problem is that to make a regression function perfect, it should be ‘in paralel’ with the cipher function, which isn’t an option for most of these are engineered to prevent this.

Adding variable length chains back into the rainbow chains could allow exploiting existing chains more efficiently.

This probably sounds messy and to be honoust this idea is still a bit messy.

I`ll write something decent along the way.

Switching to Lighttpd

Recently a lot of people seem to be switching from Apache to Lighttpd, which is a webserver that is said to be a lot faster, but even better it is said to have a constant low memory footprint.

I`m currently compiling lighttpd on my vserver (on which this blog runs), and I`ll switch to lighttpd – which should be as easy as setting some configurations for lighttpd to fit in with the current /var/www model I`m using and simply switching off Apache and switching on lighttpd.

I hope there won’t be a lot downtime.

Update Lighttpd and fastcgi don’t seem to really go together on my server configuration, so no lighttpd for a while :(.