C/C++:pthread_cond_timedwait阻塞失败(立刻超时返回)
来源:互联网 发布:买数据的网站 编辑:程序博客网 时间:2024/06/05 03:47
C/C++:pthread_cond_timedwait阻塞失败(立刻超时返回)
前几天在现网部署软件时,发现一个进程占用CPU非常非常高,仔细探查原因,发现是处理消息时pthread_cond_timedwait阻塞失败,或者说,没有到达预定的时间就已经超时返回。
代码示例如下:
#include <iostream>#include <pthread.h>#include <sys/time.h>using namespace std;class Ebupt{public: Ebupt(); virtual ~Ebupt(); void dealMsg(long wait_ns);private: pthread_mutex_t mutex; pthread_cond_t cond;};Ebupt::Ebupt(){ pthread_mutex_init(&mutex, NULL); pthread_cond_init(&cond, NULL);}Ebupt::~Ebupt(){ pthread_mutex_destroy(&mutex); pthread_cond_destroy(&cond);}void Ebupt::dealMsg(long wait_ns){ pthread_mutex_lock(&mutex); struct timeval now; gettimeofday(&now, NULL); struct timespec abstime; if (now.tv_usec*1000 + (wait_ns%1000000000) >= 1000000000) { abstime.tv_sec = now.tv_sec + wait_ns/1000000000 + 1; abstime.tv_nsec = (now.tv_usec*1000 + wait_ns%1000000000)%1000000000; } else { abstime.tv_sec = now.tv_sec + wait_ns/1000000000; abstime.tv_nsec = now.tv_usec*1000 + wait_ns%1000000000; } pthread_cond_timedwait(&cond, &mutex, &abstime); pthread_mutex_unlock(&mutex);}int main(){ Ebupt e; struct timeval now; while (true) { gettimeofday(&now, NULL); cout<<"++"<<now.tv_sec<<":"<<now.tv_usec<<endl; e.dealMsg(200000000); gettimeofday(&now, NULL); cout<<"--"<<now.tv_sec<<":"<<now.tv_usec<<endl; } return 0;}
编译及输出如下:
[ismp@cn3 20171026]$ g++ -o main main.C -lpthread[ismp@cn3 20171026]$ ./main++1509023506:721641--1509023506:721706++1509023506:721710--1509023506:721716++1509023506:721718--1509023506:721724++1509023506:721726--1509023506:721731++1509023506:721733--1509023506:721739++1509023506:721741--1509023506:721750++1509023506:721753--1509023506:721761++1509023506:721763--1509023506:721769……(CTRL+C)
理论上,我没有signal,那么应该阻塞200ms,再从阻塞中超时返回,但实际上,并没有阻塞,而是如同脱缰的野马,直接超时返回,由于dealMsg还是在一个while循环中,就如同死循环一般,CPU高当然很正常。
top看下嘞:
top - 21:15:52 up 419 days, 7:30, 2 users, load average: 9.57, 8.94, 8.32Tasks: 241 total, 3 running, 238 sleeping, 0 stopped, 0 zombieCpu(s): 10.6%us, 63.1%sy, 0.0%ni, 24.6%id, 0.0%wa, 0.0%hi, 1.6%si, 0.0%stMem: 32879016k total, 32578784k used, 300232k free, 217448k buffersSwap: 2097144k total, 749020k used, 1348124k free, 28921976k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22096 ismp 20 0 13904 1116 956 S 3.7 0.0 0:01.84 main 20409 ismp 20 0 109m 1956 1556 S 0.0 0.0 0:00.02 bash
就是个无阻塞死循环…
这个简单的示例还好,CPU飙到了4%不到,但是我那个进程直接飙到了70%多…
后来找了诸多问题,曾经想过,是不是gettimeofday使用的时钟和pthread_cond_timedwait实际使用的时钟不是同一个?
那我改改试试,如下:
#include <iostream>#include <time.h>#include <pthread.h>#include <sys/time.h>using namespace std;class Ebupt{public: Ebupt(); virtual ~Ebupt(); void dealMsg(long wait_ns);private: pthread_mutex_t mutex; pthread_cond_t cond;};Ebupt::Ebupt(){ pthread_mutex_init(&mutex, NULL); pthread_cond_init(&cond, NULL);}Ebupt::~Ebupt(){ pthread_mutex_destroy(&mutex); pthread_cond_destroy(&cond);}void Ebupt::dealMsg(long wait_ns){ pthread_mutex_lock(&mutex); struct timespec now; clock_gettime(CLOCK_REALTIME, &now); struct timespec abstime; if (now.tv_nsec + (wait_ns%1000000000) >= 1000000000) { abstime.tv_sec = now.tv_sec + wait_ns/1000000000 + 1; abstime.tv_nsec = (now.tv_nsec + wait_ns%1000000000)%1000000000; } else { abstime.tv_sec = now.tv_sec + wait_ns/1000000000; abstime.tv_nsec = now.tv_nsec + wait_ns%1000000000; } pthread_cond_timedwait(&cond, &mutex, &abstime); pthread_mutex_unlock(&mutex);}int main(){ Ebupt e; struct timeval now; while (true) { gettimeofday(&now, NULL); cout<<"++"<<now.tv_sec<<":"<<now.tv_usec<<endl; e.dealMsg(200000000); gettimeofday(&now, NULL); cout<<"--"<<now.tv_sec<<":"<<now.tv_usec<<endl; } return 0;}
[ismp@cn3 20171026]$ g++ -o main main.C -lpthread -lrt[ismp@cn3 20171026]$ ./main++1509024234:822675--1509024234:822733++1509024234:822737--1509024234:822748++1509024234:822751--1509024234:822761……(CTRL+C)
还是没有阻塞,看来并不是那个(gettimeofday和pthread_cond_timedwait使用的时钟不是同一个)原因。
如果我给条件变量加上属性试试?如下:
#include <iostream>#include <time.h>#include <pthread.h>#include <sys/time.h>using namespace std;class Ebupt{……Ebupt::Ebupt(){ pthread_mutex_init(&mutex, NULL); pthread_condattr_t condattr; pthread_condattr_init(&condattr); pthread_condattr_setclock(&condattr, CLOCK_REALTIME); pthread_cond_init(&cond, &condattr); pthread_condattr_destroy(&condattr);}……(同上)
[ismp@cn3 20171026]$ g++ -o main main.C -lpthread -lrt[ismp@cn3 20171026]$ ./main++1509024510:358162--1509024510:358221++1509024510:358225--1509024510:358236++1509024510:358239--1509024510:358249……(CTRL+C)
后来无意中发现,解决这个问题可以换个时钟,使用MONOTONIC这个时钟:
#include <iostream>#include <time.h>#include <pthread.h>#include <sys/time.h>using namespace std;class Ebupt{……Ebupt::Ebupt(){ pthread_mutex_init(&mutex, NULL); pthread_condattr_t condattr; pthread_condattr_init(&condattr); pthread_condattr_setclock(&condattr, CLOCK_MONOTONIC); pthread_cond_init(&cond, &condattr); pthread_condattr_destroy(&condattr);}……void Ebupt::dealMsg(long wait_ns){ pthread_mutex_lock(&mutex); struct timespec now; clock_gettime(CLOCK_MONOTONIC, &now); struct timespec abstime; if (now.tv_nsec + (wait_ns%1000000000) >= 1000000000) { abstime.tv_sec = now.tv_sec + wait_ns/1000000000 + 1; abstime.tv_nsec = (now.tv_nsec + wait_ns%1000000000)%1000000000; } else { abstime.tv_sec = now.tv_sec + wait_ns/1000000000; abstime.tv_nsec = now.tv_nsec + wait_ns%1000000000; } pthread_cond_timedwait(&cond, &mutex, &abstime); pthread_mutex_unlock(&mutex);}……
[ismp@cn3 20171026]$ g++ -o main main.C -lpthread -lrt[ismp@cn3 20171026]$ ./main++1509024798:440277--1509024798:640389++1509024798:640400--1509024798:840413++1509024798:840424--1509024799:40507++1509024799:40517--1509024799:240565++1509024799:240581--1509024799:440595(CTRL+C)
也就是说,最后解决办法是:
给条件变量设置时钟,使用MONOTONIC,而不使用REALTIME。
MONOTONIC使用的是jiffies变量来计算时间,是一个单调递增的时间,代表boot当前机器的时间,在boot后jiffies初始化为0;
REALTIME使用的是xtime,而这个xtime是在boot后从主板上的硬件时钟(RTC)读取的,运行时刻也会受到特权用户(例如root)使用类似date的命令影响;例如你设定在1h后超时,但是如果在这个阻塞的时间窗口中,你使用date命令将系统时间(或者叫做wall time)调整到1h之后,那么阻塞的语句会立刻超时返回,一如我们的pthread_cond_timedwait。
其实到最后也没有找出到底是什么原因导致的pthread_cond_timedwait阻塞失败,只是偶然间得出的临时的解决办法,后续有时间再研究为何pthread_cond_timedwait阻塞失败吧…
后记:
发现现网的进程的CPU占比都有点不太正常:
top - 21:45:30 up 419 days, 8:00, 1 user, load average: 8.85, 8.37, 8.38Tasks: 238 total, 4 running, 234 sleeping, 0 stopped, 0 zombieCpu(s): 10.4%us, 65.4%sy, 0.0%ni, 23.5%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%stMem: 32879016k total, 32650184k used, 228832k free, 218716k buffersSwap: 2097144k total, 749020k used, 1348124k free, 28992212k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12303 sdc 20 0 3517m 864m 7176 S 344.1 2.7 5125261h java 9 root 20 0 0 0 0 S 50.5 0.0 144940:15 ksoftirqd/1 13 root 20 0 0 0 0 R 48.9 0.0 157182:21 ksoftirqd/2 4 root 20 0 0 0 0 R 46.9 0.0 153791:36 ksoftirqd/0 33 root 20 0 0 0 0 S 46.5 0.0 148379:24 ksoftirqd/7 21 root 20 0 0 0 0 R 44.2 0.0 156277:16 ksoftirqd/4 29 root 20 0 0 0 0 S 43.2 0.0 154775:19 ksoftirqd/6 17 root 20 0 0 0 0 S 30.9 0.0 174973:53 ksoftirqd/3 25 root 20 0 0 0 0 S 10.0 0.0 156328:27 ksoftirqd/5 27888 www 20 0 177m 121m 1900 S 1.3 0.4 1167:11 nginx 41 root 20 0 0 0 0 S 0.3 0.0 17:36.77 events/6 21937 sdc 20 0 134m 7564 1136 S 0.3 0.0 57:06.09 redis-server 24218 ismp 20 0 15164 1344 944 R 0.3 0.0 0:00.01 top 27890 www 20 0 180m 124m 1900 S 0.3 0.4 1163:55 nginx 27891 www 20 0 170m 114m 1912 S 0.3 0.4 1069:01 nginx 1 root 20 0 19348 852 544 S 0.0 0.0 0:01.41 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 1:02.86 migration/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
尤其是java后台进程和ksoftirqd。
我猜测java是不是也是底层使用了条件变量结果没有阻塞住?
后来的后来…重启了一下现网的机器,各个进程占用的CPU就降下来了,然后也不会再出现上面阻塞失败的问题了……
如果有小伙伴曾经有见过这个问题,欢迎指教哈,嘿嘿~
- C/C++:pthread_cond_timedwait阻塞失败(立刻超时返回)
- C/C++ pthread_cond_timedwait()函数使用心得
- linux c实现超时、非阻塞socket的函数select
- pthread_cond_timedwait()返回错误值
- pthread_cond_timedwait返回EINVAL
- Recvfrom() 返回 WSAECONNRESET 代替阻塞或超时
- grpc 设置超时(c++)
- 为什么Http1.1中Socket.Receive(byte[])阻塞,而Http1.0中立刻返回?
- 关于如何使用pthread_cond_timedwait()超时接口
- 关于如何使用pthread_cond_timedwait()超时接口
- 关于如何使用pthread_cond_timedwait()超时接口
- 使用pthread_cond_timedwait()超时时间设置问题
- (原创)Callable、FutureTask中阻塞超时返回的坑点
- 2015 CCPC C题(超时代码)
- 关于页面超时(C#)
- C - Stripe(注意超时)
- pthread_cond_timedwait
- pthread_cond_timedwait()
- 【复习记录】BFS
- Servlet中统计网站访问量
- PAT 1004. 成绩排名 (20)
- STL 源码分析之string(二)基础篇—append,reserve,assign
- python学习笔记之尾递归
- C/C++:pthread_cond_timedwait阻塞失败(立刻超时返回)
- 第八周项目五 字符串加密 (选做)
- mysql数据库优化
- linux学习之文件/文件夹操作
- POJ 3416 Crossing(树状数组)
- C#基础(9)——方法的重载、递归
- spring mvc(一)
- 工业器件检测和识别
- 关于CS231N-Assignment1-KNN中no-loop矩阵乘法代码的讲解