Condition Variable Spurious Wakes

来源：互联网发布：约爱是什么软件编辑：程序博客网时间：2024/05/18 02:24

Condition variables are a useful mechanism for waiting until anevent occurs or some "condition" is satisfied. For example, in myimplementationof a thread-safe queue I use a condition variable to avoidbusy-waiting inwait_and_pop() when the queue isempty. However, condition variables have one "feature" which is acommon source of bugs: a wait on a condition variable may return evenif the condition variable has not been notified. This is called aspurious wake.

Spurious wakes cannot be predicted: they are essentially randomfrom the user's point of view. However, they commonly occur when thethread library cannot reliably ensure that a waiting thread will notmiss a notification. Since a missed notification would render thecondition variable useless, the thread library wakes the thread fromits wait rather than take the risk.

Bugs due to spurious wakes

Consider the code for wait_and_pop from mythread-safequeue:

    void wait_and_pop(Data& popped_value)    {        boost::mutex::scoped_lock lock(the_mutex);        while(the_queue.empty())        {            the_condition_variable.wait(lock);        }                popped_value=the_queue.front();        the_queue.pop();    }

If we know that there's only one consumer thread, it would betempting to write this with anif instead of awhile, on the assumption that there's only one threadwaiting, so if it's been notified, the queue must not be empty:

    if(the_queue.empty()) // Danger, Will Robinson    {        the_condition_variable.wait(lock);    }

With the potential of spurious wakes this is not safe: thewait might finish even if the condition variable was notnotified. We therefore need thewhile, which has theadded benefit of allowing multiple consumer threads: we don't need toworry that another thread might remove the last item from the queue,since we're checking to see if the queue is empty beforeproceeding.

That's the beginner's bug, and one that's easily overcome with asimple rule:always check your predicate in a loop whenwaiting with a condition variable. The more insidious bugcomes fromtimed_wait().

Timing is everything

condition_variable::wait() has a companion functionthat allows the user to specify a time limit on how long they'rewilling to wait:condition_variable::timed_wait(). Thisfunction comes as a pair of overloads: one that takes an absolutetime, and one that takes a duration. The absolute time overload willreturn once the clock reaches the specified time, whether or not itwas notified. The duration overload will return once the specifiedduration has elapsed: if you say to wait for 3 seconds, it will stopwaiting after 3 seconds. The insidious bug comes from the overloadthat takes a duration.

Suppose we wanted to add a timed_wait_and_pop()function to our queue, that allowed the user to specify a duration towait. We might be tempted to write it as:

    template<typename Duration>    bool timed_wait_and_pop(Data& popped_value,                            Duration const& timeout)    {        boost::mutex::scoped_lock lock(the_mutex);        while(the_queue.empty())        {            if(!the_condition_variable.timed_wait(lock,timeout))                return false;        }                popped_value=the_queue.front();        the_queue.pop();        return true;    }

At first glance this looks fine: we're handling spurious wakes bylooping on thetimed_wait() call, and we're passing thetimeout in to that call. Unfortunately, thetimeout is aduration, so every call totimed_wait() will wait up to the specified amount oftime. If the timeout was 1 second, and thetimed_wait()call woke due to a spurious wake after 0.9 seconds, the next timeround the loop would wait for a further 1 second. In theory this couldcontinue ad infinitum, completely defeating the purpose of usingtimed_wait() in the first place.

The solution is simple: use the absolute time overload instead. Byspecifying a particular clock time as the timeout, the remaining waittime decreases with each call. This requires that we determine thefinal timeout prior to the loop:

    template<typename Duration>    bool timed_wait_and_pop(Data& popped_value,                            Duration const& wait_duration)    {        boost::system_time const timeout=boost::get_system_time()+wait_duration;        boost::mutex::scoped_lock lock(the_mutex);        while(the_queue.empty())        {            if(!the_condition_variable.timed_wait(lock,timeout))                return false;        }                popped_value=the_queue.front();        the_queue.pop();        return true;    }

Though this solves the problem, it's easy to make themistake. Thankfully, there is a better way to wait that doesn't sufferfrom this problem: pass the predicate to the condition variable.

Passing the predicate to the condition variable

Both wait() and timed_wait() come withadditional overloads that allow the user to specify the conditionbeing waited for as a predicate. These overloads encapsulate thewhile loops from the examples above, and ensure thatspurious wakes are correctly handled. All that is required is that thecondition being waited for can be checked by means of a simplefunction call or a function object which is passed as an additionalparameter to thewait() or timed_wait()call.

wait_and_pop() can therefore be written like this:

    struct queue_not_empty    {        std::queue<Data>& queue;        queue_not_empty(std::queue<Data>& queue_):            queue(queue_)        {}        bool operator()() const        {            return !queue.empty();        }    };    void wait_and_pop(Data& popped_value)    {        boost::mutex::scoped_lock lock(the_mutex);        the_condition_variable.wait(lock,queue_not_empty(the_queue));        popped_value=the_queue.front();        the_queue.pop();    }

and timed_wait_and_pop() can be written like this:

    template<typename Duration>    bool timed_wait_and_pop(Data& popped_value,                            Duration const& wait_duration)    {        boost::mutex::scoped_lock lock(the_mutex);        if(!the_condition_variable.timed_wait(lock,wait_duration,            queue_not_empty(the_queue)))            return false;        popped_value=the_queue.front();        the_queue.pop();        return true;    }

Note that what we're waiting for is the queue not to beempty — the predicate is the reverse of the condition we wouldput in the while loop. This will be much easier to specify whencompilers implement the C++0x lambda facilities.

Conclusion

Spurious wakes can cause some unfortunate bugs, which are hard totrack down due to the unpredictability of spurious wakes. Theseproblems can be avoided by ensuring that plainwait()calls are made in a loop, and the timeout is correctly calculated fortimed_wait() calls. If the predicate can be packaged as afunction or function object, using the predicated overloads ofwait() andtimed_wait() avoids all theproblems.

0 0