Multithreaded simple data type access and atomic variables

来源：互联网发布：如何优化标题编辑：程序博客网时间：2024/06/05 20:44

Do you need a mutex to protect an int?
来自：http://www.alexonlinux.com/do-you-need-mutex-to-protect-int
Recently I ran into few pieces of code here and there that assumed that int is an atomic type. I.e. when you modify value of the variable from two or more different threads at the same time, all of the changes you’ve made to the value will remain intact.

But really, can you modify variables of basic types (integers, floats, etc), from two or more threads, at the same time, without screwing their value?

Until now I’ve been very precautious with questions of this kind. I’ve spend enormous amount of time solving synchronization problems. So when I have a variable that being modified or accessed from two or more threads I always put some mutex or semaphore (spinlock in kernel) around it – no questions asked.

Still, I decided to check things out, perhaps one more time. I wanted to see what happens with a variable if two or more threads modifying it. So, I’ve written a short program that demonstrates what happens when you do something like this.

Here’s the code. See my comments below to understand what it does and how it does it.

#include <stdio.h>#include <pthread.h>#include <unistd.h>#include <stdlib.h>#include <sched.h>#include <linux/unistd.h>#include <sys/syscall.h>#include <errno.h>#define INC_TO 1000000 // one million...int global_int = 0;pid_t gettid( void ){    return syscall( __NR_gettid );}void *thread_routine( void *arg ){    int i;    int proc_num = (int)(long)arg;    cpu_set_t set;    CPU_ZERO( &set );    CPU_SET( proc_num, &set );    if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))    {        perror( "sched_setaffinity" );        return NULL;    }    for (i = 0; i < INC_TO; i++)    {        global_int++;    }    return NULL;}int main(){    int procs = 0;    int i;    pthread_t *thrs;    // Getting number of CPUs    procs = (int)sysconf( _SC_NPROCESSORS_ONLN );    if (procs < 0)    {        perror( "sysconf" );        return -1;    }    thrs = malloc( sizeof( pthread_t ) * procs );    if (thrs == NULL)    {        perror( "malloc" );        return -1;    }    printf( "Starting %d threads...\n", procs );    for (i = 0; i < procs; i++)    {        if (pthread_create( &thrs[i], NULL, thread_routine, (void *)(long)i ))        {            perror( "pthread_create" );            procs = i;            break;        }    }    for (i = 0; i < procs; i++)        pthread_join( thrs[i], NULL );    free( thrs );    printf( "After doing all the math, global_int value is: %d\n", global_int );    printf( "Expected value is: %d\n", INC_TO * procs );    return 0;}

You can download the code and a Makefile here.

The setup is simple. Number of threads is as number of CPUs in your machine (this includes cores and even hyperthreaded semi-cores). Each thread affined to certain core with sched_setaffinity(). I.e. scheduler configured to run certain thread on certain core, to make sure that all cores access that variable. Each thread increases a global integer named global_int one million times.

In the meantime, main thread waits for the worker threads to do their job. Once they’re done, it prints the value of global_int.

And the result is:

Starting 4 threads…
After doing all the math, global_int value is: 1908090
Expected value is: 4000000
I guess numbers speak for themselves.

In case you still have some doubts, consider this. When I compile the code with -O2 compiler option (enabling optimization), the value of the global_int is 4000000, as you may have expected. So, it is not really black and white. Also, things may be different on different computers.

However, the bottom line is that you want to be on the safe side. This code is a little simple proof that in some cases, simultaneous access to certain basic variable can cause its value to become ambiguous. So, to be on the safe side, protect your shared variables, no matter if it is a complex data structure or simple integers.

pthread spinlocks
来自：http://www.alexonlinux.com/pthread-spinlocks
Continuing my previous post, I would like to talk about relatively new feature in glibc and pthreads in particular. I am talking about spinlocks.

Quiet often you want to protect some simple data structures from simultaneous access by two or more threads. As in my previous post, this can even be a simple integer. Using mutexes and semaphores (critical sections if you wish) to protect simple integer is an overkill and here’s why.

Modifying or reading value of a single integer requires quiet often as few as two or three CPU instructions. On the other hand, acquiring semaphore is a huge operation. It involves at least one system call translated into thousands of CPU instructions. Same when releasing the semaphore.

Things are a little better with mutexes, but still far from being perfect. Posix mutexes in Linux implemented using futexes. Futex stands for Fast-Usermode-muTEX. The idea behind futex is not to do system call when locking unlocked futex. Waiting for locked futex would still require system call because this is how processes wait for something in Linux. Yet locking unlocked futex can be done without asking kernel to do things for you. Therefore, locking futex is, at least in some cases, is very fast.

The problem is that in rest of the cases mutex in Linux is slow as semaphore. And as with semaphores, spending thousands of CPU cycles just to protect a single integer is definitely an overkill.

This is exactly the problem spinlocks solve. Spinlock is another synchronization mechanism. It works in the same manner as mutexes. I.e. only one thread can have it locked at the same time. However there’s a difference.

When thread tries to lock locked spinlock, it won’t sleep waiting for the spinlock to get unlocked. It will do busy wait, i.e. spinning in a while loop. This is why its called spinlock.

Locking spinlock takes only tens of CPU cycles. One important thing with spinlocks is to hold it for short period of time. Don’t be surprised when your program will begin consuming too much CPU, all because one of your threads holds some spinlock for too long. To avoid this, try to avoid executing big chunks of code while holding a spinlock. And by all means avoid doing I/O while holding spinlock.

Now for the easy part.

First, to make things work, you have to include pthread.h. pthread_spin_init() initializes the spinlock. pthread_spin_lock() locks one and pthread_spin_unlock() unlocks it. All as with pthread mutexes. And there are manual pages for each and every one of them of course.
、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、
How atomic variables work

This is actually quiet simple. Intel x86 and x86_64 processor architectures (as well as vast majority of other modern CPU architectures) has instructions that allow one to lock FSB, while doing some memory access. FSB stands for Front Serial Bus. This is the bus that processor use to communicate with RAM. I.e. locking FSB will prevent from any other processor (core), and process running on that processor, from accessing RAM. And this is exactly what we need to implement atomic variables.

Atomic variables being widely used in kernel, but from some reason no-one bothered to implement them for user-mode folks. Until gcc 4.1.2.

Atomic variables size limitations

From practical considerations, gurus at Intel did not implement FSB locking for every possible memory access. For instance, for quiet some time, Intel processors allow memcpy() and memcmp() implementation with one processor instruction. But locking FSB while copying large memory buffer can be too expensive.

In practice you can lock FSB while accessing 1, 2, 4 and 8 byte long integers. Almost transparently, gcc allows you to do atomic operations on int‘s, long‘s and long long‘s (and their unsigned counterparts).

Use cases

Incrementing a variable and knowing that no-one else screws its value is nice, but not enough. Consider following piece of pseudo-code.

1
decrement_atomic_value();
2
if (atomic_value() == 0)
3
fire_a_gun();
Let us imagine that the value of an atomic variable is 1. What happens if two threads of execution try to execute this piece of pseudo-C simultaneously?

Back to our simulation. It is possible that thread 1 will execute line 1 and stop, while thread 2 will execute line 1 and continue executing line 2. Later thread 1 will wake up and execute line 2.

two_threads

When this happens, no one of the threads will run fire_a_gun() routine (line 3). This is obviously wrong behavior and if we were protecting this piece of code with a mutex or a spinlock this would not have happened.

In case you’re wondering how likely something like this to happen, be sure that this is very likely. When I first started working with multithreaded programing I was amazed to find out that despite our intuition tells us that scenario I described earlier is unlikely, it happens overwhelmingly often.

As I mentioned, we could solve this problem by giving up on atomic variables and using spinlock or mutex instead. Luckily, we can still use atomic variables. gcc developers have thought about our needs and this particular problem and offered a solution. Lets see actual routines that operate atomic variables.

The real thing…

There are several simple functions that do the job. First of all, there are twelve (yes, twelve – 12) functions that do atomic add, substitution, and logical atomic or, and, xor and nand. There are two functions for each operation. One that returns value of the variable before changing it and another that returns value of the variable after changing it.

Here are the actual functions:

1
type __sync_fetch_and_add (type *ptr, type value);
2
type __sync_fetch_and_sub (type *ptr, type value);
3
type __sync_fetch_and_or (type *ptr, type value);
4
type __sync_fetch_and_and (type *ptr, type value);
5
type __sync_fetch_and_xor (type *ptr, type value);
6
type __sync_fetch_and_nand (type *ptr, type value);
These are functions that return value of the variable before changing it. Following functions, on the other hand, return value of the variable after changing it.

1
type __sync_add_and_fetch (type *ptr, type value);
2
type __sync_sub_and_fetch (type *ptr, type value);
3
type __sync_or_and_fetch (type *ptr, type value);
4
type __sync_and_and_fetch (type *ptr, type value);
5
type __sync_xor_and_fetch (type *ptr, type value);
6
type __sync_nand_and_fetch (type *ptr, type value);
type in each of the expressions can be one of the following:

int
unsigned int
long
unsigned long
long long
unsigned long long
These are so called built-in functions, meaning that you don’t have to include anything to use them.

以上内容摘自文章《Multithreaded simple data type access and atomic variables》，翻不了墙的读者可以看http://blog.csdn.net/u014659211/article/details/50827719

阅读全文

1 0