Avoiding multiple lock dead-locks by memory addresses

Sometimes you need to lock several resources. If you don’t take great care you are likely to get yourself into dead-locks. A simple example with just two lockable resources A and B:

function foo {
  lock A;
  lock B;
  // Do something with A and B
  unlock A;
  unlock B;

function bar {
  lock B;
  lock A;
  // Do something different with A and B
  unlock B;
  unlock A;

When foo and bar are called at about the same time then there is the change that foo locks A and bar locks B which will make bar wait on foo‘s lock on A and vice versa.

Solution: fixed order on memory address
The simplest way to get rid of the deadlock is to always try to acquire a lock on A before on B. A generic solution would be to always lock the resource with the lowest memory address first.

This only works when memory addresses are fixed or that there is an otherwise fixed order-able property.


While working on paradox, Kaja and I felt that we needed a cross-platform threading library, which handles all the subtle differences in capabilities and semantics of each platform`s API.

After a rewrite of Kaja`s original code, I present to you ithreads. You can find the source in this subversion repository.


At this moment ithreads gives you implementations of:


  • Event (aka. conditional variable)
  • Semaphore
  • R/w lock
  • Barrier
  • Gate
  • Thread

By using (one or more of) the following APIs:

  • pthreads, posix thread API (unix, linux …)
  • pth, portable user-space threads
  • futex, linux`s lightweight fast hybrid locking

ithreads will use generic implementations in case one of the primitives (eg. Semaphore for pthreads) isn’t available on your system.

I’m espacially happy with the futex based implementations. They outperform the pthreads implementation, and are a lot smaller (4 vs. ~16 bytes for a mutex in pthreads) – although they aren’t as robust (but still correct).
Support for BeOS, OS/2, Windows and more will be added, but testing them will be tricky.

The library will still need a lot of stress testing and probably still contains some bugs – you’re warned. When everything works nicely I’ll add all kinds of extra flavours of the primitives, eg. read-favouring r/w-lock.

I hope it will prove useful.

Thread synchronization mechanisms (Locks)

To share it with you and making sure I won’t forget, I present you with (some of) the different thread synchronization mechanisms known to mankind. Most of the mechanisms are used to lock operations or data, some aren’t. You know, those nifty little things that ought to help you with thread hell.

The semaphore is the cournerstone of locking – often other locks are implemented using one or more semaphores. A semaphore is a simple integer. The initial value of this integer is the amount of threads that can attain (aka. lock, down, enter, pass …) the semaphore before it will block incoming threads – a blocked thread will have to wait until a thread that has attained the semaphore releases (aka. unlocks, ups, leaves …) the semaphore.

The implementation of a semaphore is an atomic integer on most systems. When a thread wants to attain the semaphore it will check whether the value is greater than 0. If it is it will decrement the value and the semaphore has been attained. If the value is 0 it means that there are already as much threads that have attained the lock as the semaphore allows – and the thread will wait a while and try again. If a thread releases the semaphore it will increment the value.

You can compare a semaphore with a dining table in a mess hall at which only a certain amount of people can dine. People will either have to wait for someone to finish and stand up, or they’ll have to look for some other place.

The mutual exclusion lock is the lock you’ll be most familiar with and is often just referred to as a lock, because of its popularity. A mutex is either locked or unlocked. Only one thread can hold the lock – all others will have to wait for that other thread to release the mutex. The mutex is often used to assure mutual exclusive access to a certain operation or piece of data.

A mutex is simply just implemented as a semphore with a starting value of 1. The first thread that would attain the mutex would decrement the value to 0 and block all other incoming threads – until it releases the lock.

You can compare a mutex with the door lock on your restroom, it will help you make sure that you got some privacy.

An event will block all incoming threads, until it is signalled to release a thread or all of them at the same time. Events are quite useful when dealing with situations when you want to queue threads before they are allowed to pass. Events aren’t locks that you lock and after that have to unlock. A thread just waits for an event, and doesn’t have to ‘unlock’ it afterwards.

An event is usually also implemented using a semaphore. This time the default value of the semaphore would be 0. If a thread wants to pass the event it’ll try to decrement the semaphore`s value, unless it is 0 in which case it’ll wait, just as you would normally lock a semaphore. If the event is signalled to release a thread it will increment the semaphore`s value – one of the waiting threads will notice that the value isn’t 0 anymore, decrement it to 0 again and pass. If the event is signalled to release all waiting threads it’ll keep on incrementing the value until all threads have passed. Usually there is a mutex alongside the semaphore to make sure that a thread can’t slip in, while all threads are being released, and be released to soon.

You can compare an event with a (chaotic) queue in front of a fun park attraction. People have to wait until the operator of the attraction tells one of them to continue, or tells all of them to continue at once. (the attraction could be nearly empty with enough room for all of them).

A gate is a flavour of an event. It blocks incoming threads just like an event, and lets all of them through if signalled. The difference it that it won’t close again – once signalled (‘opened’) it’ll allow all threads through until it is explicitly closed. It also can’t let threads through one by one.

A gate is usually implemented using a semaphore, with the initial value of 0 so it will block incoming threads. Each incoming thread tries to attain the semaphore as is done too with the event – but when a semaphore lock is attained it’ll directly release the semaphore so it’ll be available for another thread waiting to pass too.

You can compare a gate with a … gate. Or a draw bridge if you like. If it’s closed none can pass. If it’s opened everyone can pass. Very straightforward.

A barrier blocks all incoming thread until there are a certain number of threads. In that case the barrier will release all those threads, but will remain to block incoming threads (until they have reached the specified count together too, etc). A barrier is useful to make sure that a set of threads all start something at the same time and maybe even more important aren’t doing what they did before.

A barrier is usually implemented as a custom event, that will have a waiter-count-field (and a mutex to synchronize access to it), which will auto release all threads when the waiter count has reached the specified number.

You can compare a barrier with watter drops from a tap. The drop won’t fall until there is a specific amount of water that causes it to fall.

So far for the thread synchronization mechanisms today. Tomorrow I’ll maybe talk about the hassles of implementing them yourself.

Some credit goes to Kaja, who told me about all these locks in the first place :-).

Memory Alignment Havoc

Memory Alignment is one of the most important concepts when working low-level with memory. The bliss of the x86 processor, which is very kind about alignment, has left me unaware of it for years. That was until Kaja mentioned it, and therefore I decided to rebuild my Gc.

Like err.. what you talking ’bout?

Data should be aligned in the memory. Why? Because the processor doesn’t see the memory as most programmers do. For the processor the memory isn’t an array of bytes, but rather an array of blocks. In the case of the x86 processor this is usually 4 bytes.

When I got a simple piece of code like ‘(*i)++‘ the processor will find the 4 byte block in which this 4 byte integer i is located, increments it and stores it.

Well.. it does this when i fits exactly in one block. When i is partially in one block, and the rest is in a second block (it is unaligned) the processor will have to fetch both blocks, reconstruct the integer, increment it, split it, and store it back.

This is a lot of extra work. The processor will spend a few times more work on a simple increment. Although the actual difference is only about 5% because requesting 2 blocks or 1 block doesn’t differ that much.

Because this means a lot of extra work for the processor, and even more transistors, some processor manufacturers decided to barely or not support unaligned data.


  • Some processors will just cope with unalligned data, although it takes a bit more time. (x68)
  • Other’s will throw an hardware exception and let the OS fix it up, which takes a hell of a lot more time.
  • A bunch will get unstable and the process crashes, or the whole processor will.
  • Ans some will silently ignore the unaligned data using the first aligned data that comes close instead which leads to silent corruptions and hard to trace bugs.

Bliss of C

Maybe this will shock you, at least it shocked me. Luckily you don’t need to worry a lot when you aren’t writing specific low-level applications like memory managers or compilers. Your C compiler will take care of alignment for you.

Gimmi more

Kaja wrote an article about memory alignment.
IBM has got an excelent article on it too.