Thread synchronization implementation performance

Thread safety is certainly a very important factor in modern software design, especially as the number of cores increases per machine, and being able to run code concurrently becomes a requirement.

So I decided to see just how expensive using pthread_mutex_lock and semaphores are. The basic test idea is to see how many times you can access a critical section of code. I ran the same test for each synchronization implementation:

  1. No thread-safety.
  2. Thread safe using pthread_mutex_lock.
  3. Thread safe using sem_wait.

The code for each test is below:

void lazy_init_no_lock(void)
{
    static _Bool __isInitialized = 0;
    if (__isInitialized)
        __isInitialized = 1;

    __count++;
}

void lazy_init_lock(void)
{
    static _Bool __isInitialized = 0;

    pthread_mutex_lock(&__lock);
    if (__isInitialized)
        __isInitialized = 1;
    pthread_mutex_unlock(&__lock);

    __count++;
}

void lazy_init_semaphore(void)
{
    static _Bool __isInitialized = 0;

    sem_wait(&sem);
    if (__isInitialized)
        __isInitialized = 1;
    sem_post(&sem);

    __count++;
}

And the results were certainly much more interesting than the code. Pretty much, I called each test from an infinite loop for 5 seconds, and docked how many times I was able to complete the call.

    while(1)
    {
        if (test == 0)
            lazy_init_no_lock();
        else if (test == 1)
            lazy_init_lock();
        else if (test == 2)
            lazy_init_semaphore();
        else
            break;
    }

On average, having no thread safety yielded 440,909,468 calls, thread-safety with mutexes yielded 98,324,839 calls, and semaphores brought up the rear with 1,197,981 calls. It made a nice little graph:
Graph

So I guess the moral of the story is, if you don’t need super powers when doing thread-safety, use mutexes. And if you know your code is not going to run on a multithreaded system, don’t nest all your critical regions in mutexes or semaphores.