Avoiding multiple lock dead-locks by memory addresses

Sometimes you need to lock several resources. If you don’t take great care you are likely to get yourself into dead-locks. A simple example with just two lockable resources A and B:

function foo {
  lock A;
  lock B;
  // Do something with A and B
  unlock A;
  unlock B;
}

function bar {
  lock B;
  lock A;
  // Do something different with A and B
  unlock B;
  unlock A;
}

When foo and bar are called at about the same time then there is the change that foo locks A and bar locks B which will make bar wait on foo‘s lock on A and vice versa.

Solution: fixed order on memory address
The simplest way to get rid of the deadlock is to always try to acquire a lock on A before on B. A generic solution would be to always lock the resource with the lowest memory address first.

This only works when memory addresses are fixed or that there is an otherwise fixed order-able property.

IThreads

While working on paradox, Kaja and I felt that we needed a cross-platform threading library, which handles all the subtle differences in capabilities and semantics of each platform`s API.

After a rewrite of Kaja`s original code, I present to you ithreads. You can find the source in this subversion repository.

svn://w-nz.com/px

At this moment ithreads gives you implementations of:

Mutex

  • Event (aka. conditional variable)
  • Semaphore
  • R/w lock
  • Barrier
  • Gate
  • Thread

By using (one or more of) the following APIs:

  • pthreads, posix thread API (unix, linux …)
  • pth, portable user-space threads
  • futex, linux`s lightweight fast hybrid locking

ithreads will use generic implementations in case one of the primitives (eg. Semaphore for pthreads) isn’t available on your system.

I’m espacially happy with the futex based implementations. They outperform the pthreads implementation, and are a lot smaller (4 vs. ~16 bytes for a mutex in pthreads) – although they aren’t as robust (but still correct).
Support for BeOS, OS/2, Windows and more will be added, but testing them will be tricky.

The library will still need a lot of stress testing and probably still contains some bugs – you’re warned. When everything works nicely I’ll add all kinds of extra flavours of the primitives, eg. read-favouring r/w-lock.

I hope it will prove useful.

Thread synchronization mechanisms (Locks)

To share it with you and making sure I won’t forget, I present you with (some of) the different thread synchronization mechanisms known to mankind. Most of the mechanisms are used to lock operations or data, some aren’t. You know, those nifty little things that ought to help you with thread hell.

Semaphore
The semaphore is the cournerstone of locking – often other locks are implemented using one or more semaphores. A semaphore is a simple integer. The initial value of this integer is the amount of threads that can attain (aka. lock, down, enter, pass …) the semaphore before it will block incoming threads – a blocked thread will have to wait until a thread that has attained the semaphore releases (aka. unlocks, ups, leaves …) the semaphore.

The implementation of a semaphore is an atomic integer on most systems. When a thread wants to attain the semaphore it will check whether the value is greater than 0. If it is it will decrement the value and the semaphore has been attained. If the value is 0 it means that there are already as much threads that have attained the lock as the semaphore allows – and the thread will wait a while and try again. If a thread releases the semaphore it will increment the value.

You can compare a semaphore with a dining table in a mess hall at which only a certain amount of people can dine. People will either have to wait for someone to finish and stand up, or they’ll have to look for some other place.

Mutex
The mutual exclusion lock is the lock you’ll be most familiar with and is often just referred to as a lock, because of its popularity. A mutex is either locked or unlocked. Only one thread can hold the lock – all others will have to wait for that other thread to release the mutex. The mutex is often used to assure mutual exclusive access to a certain operation or piece of data.

A mutex is simply just implemented as a semphore with a starting value of 1. The first thread that would attain the mutex would decrement the value to 0 and block all other incoming threads – until it releases the lock.

You can compare a mutex with the door lock on your restroom, it will help you make sure that you got some privacy.

Event
An event will block all incoming threads, until it is signalled to release a thread or all of them at the same time. Events are quite useful when dealing with situations when you want to queue threads before they are allowed to pass. Events aren’t locks that you lock and after that have to unlock. A thread just waits for an event, and doesn’t have to ‘unlock’ it afterwards.

An event is usually also implemented using a semaphore. This time the default value of the semaphore would be 0. If a thread wants to pass the event it’ll try to decrement the semaphore`s value, unless it is 0 in which case it’ll wait, just as you would normally lock a semaphore. If the event is signalled to release a thread it will increment the semaphore`s value – one of the waiting threads will notice that the value isn’t 0 anymore, decrement it to 0 again and pass. If the event is signalled to release all waiting threads it’ll keep on incrementing the value until all threads have passed. Usually there is a mutex alongside the semaphore to make sure that a thread can’t slip in, while all threads are being released, and be released to soon.

You can compare an event with a (chaotic) queue in front of a fun park attraction. People have to wait until the operator of the attraction tells one of them to continue, or tells all of them to continue at once. (the attraction could be nearly empty with enough room for all of them).

Gate
A gate is a flavour of an event. It blocks incoming threads just like an event, and lets all of them through if signalled. The difference it that it won’t close again – once signalled (‘opened’) it’ll allow all threads through until it is explicitly closed. It also can’t let threads through one by one.

A gate is usually implemented using a semaphore, with the initial value of 0 so it will block incoming threads. Each incoming thread tries to attain the semaphore as is done too with the event – but when a semaphore lock is attained it’ll directly release the semaphore so it’ll be available for another thread waiting to pass too.

You can compare a gate with a … gate. Or a draw bridge if you like. If it’s closed none can pass. If it’s opened everyone can pass. Very straightforward.

Barrier
A barrier blocks all incoming thread until there are a certain number of threads. In that case the barrier will release all those threads, but will remain to block incoming threads (until they have reached the specified count together too, etc). A barrier is useful to make sure that a set of threads all start something at the same time and maybe even more important aren’t doing what they did before.

A barrier is usually implemented as a custom event, that will have a waiter-count-field (and a mutex to synchronize access to it), which will auto release all threads when the waiter count has reached the specified number.

You can compare a barrier with watter drops from a tap. The drop won’t fall until there is a specific amount of water that causes it to fall.

So far for the thread synchronization mechanisms today. Tomorrow I’ll maybe talk about the hassles of implementing them yourself.

Some credit goes to Kaja, who told me about all these locks in the first place :-).

Dual Cores, Traffic and Gaming

Dual Cores are the new thing in the processor bussiness. They claim to be perfect for people who want to do different things at the same time but in reality they are just a cheap solution to deliver more.

The truth is that two 500 mhz processors just don’t deliver the calculation power of one 1 000mhz processor. Given offcourse that they are of the same architecture. This is due to the fact that there is a lot of additional logic required to let two processors work together without getting eachother in the way. When two processors both want to write another thing to the same spot of memory you would get impredictable results. To solve this you have all kind of ways to solve it, by for instance locking a region of memory for an amount of time, making the other processor to wait for the first one to finish. But this all creates a lot of overhead and complexity.

You could compare it with driving a car. Driving a car is relatively easy. You just have to steer around some static obstacles, no big deal. When you know the way you could even do it with your eyes closed. That is unless there are other people driving a car. When driving a car you are keeping an eye, not on the road, but on the other people on the road. This not only slows down your maximum speed or decreases efficiency – you can’t just drive full speed over a junction – but it also increases the complexity and the likelyhood of errors.

The same thing goes in the case of dual core processors (or even hyper-threaded and normal multi processor platforms). Although the comparison isn’t really valid because having two processors doesn’t mean that you have to do two things at the same time. The issue is though that what you normally would do, would only be done by one processor and you are therefore wasting the other’s capability.

A good example of this are games. Games tend to be single threaded, which gives best performance for no processor time is wasted to multithreading and it is the easiest thing to do for multithreaded is rather complicated. Complicated enough that there have been a few lengthy discussions in the mono mailing lists how to lock a simple object.

Because we are getting dual core and propably ‘more’core processors lateron for the companies are too lazy to make decent processors1 games should become multithreaded to exploit the full capability of the machine it runs on.

Although it makes creating performance applications a lot more difficult it will surelly benifit distributed computing for the change from different processors to different machines is less than from 1 processor to more.

[1] Native threaded CPU’s like dualcores/multi processor/hyperthreaded processors are ideal for server applications where multiple short living requests have to be resolved. Switching an allocating on a software-threaded processor would create too much overhead for such a simple request.