TraTraffic Server 进程模型
来源:互联网 发布:jbl煲机软件 编辑:程序博客网 时间:2024/05/22 03:39
http://www.cnblogs.com/liushaodong/archive/2013/02/26/2933280.html
1.概述
Traffic Server包括三个一起工作的进程来服务Traffic Server的请求,管理/控制/监控系统的健康状况。图1说明了三个进程的关系,三个进程将会在下面描述。
图1:进程之间的关系
1)traffic_server进程是 Traffic Server的事务处理引擎。它负责接收连接、处理协议请求以及从本地缓存或源服务器提供资源。
2)traffic_manager进程是用来命令和控制Traffic Server的工具,负责启动、监控以及重新配置traffic_server进程。traffic_manager进程同时负责代理自动配置端口、统计接口、集群管理以及vip故障转移。
如果traffic_manager进程检测到traffic_server进程失败,它不仅会立即重启该进程,而且会为所有传入的请求维护一个连接队列。在traffic_server重启前的几秒内传入的所有连接将会被保存在连接队列中,并以FIFO的方式处理。这个连接队列接受任何server故障重启时的连接。
3)traffic_cop进程监控traffic_server和traffic_manager进程的健康状况。traffic_cop进程通过抓取合成web页面的心跳请求方式周期性的(每分钟若干次)查询traffic_server和traffic_manager进程。如果失败事件发生(如果在超时时间间隔内没有收到请求或者收到错误的请求),traffic_cop重启traffic_server和traffic_manager进程。系统这样设计的好处便是给traffic_server进程加上了来自traffic_manager和traffic_cop的双重保障,因为traffic_server进程是工作进程,必须保证它的正常运行。-
4)traffic server采用的是多线程异步事件处理模型:Traffic Server并不是为每个连接都建立一个线程,而是事先创建一组数量可配置的工作线程,每一个工作线程上都运行着独立的异步事件处理程序。traffic_server创建若干组Thread,并将Event按类型调度到相应的Thread的Event队列上,Thread通过执行Event对应的Continuation中的回调函数,来完成状态的迁移。从初始态到终止态的迁移代表了整个事件的执行过程,而Thread是永不退出的,等待着下一个事件的到来。
本文重点在于分析traffic server中三个进程的关系以及实现,对于其多线程异步事件处理模型不作深入分析。进程模型图如下:
2.实现原理
基本原理:对traffic_manager进程和traffic_server进程分别配置对应的manager_lockfile和server_lockfile文件,traffic_cop通过两个lockfile文件来监控traffic_manager和traffic_server进程,同理traffic_manager进程通过server_lockfile来监控traffic_server进程。图2说明了这种关系:
图2:进程以及lockfile文件的关系
关键实现:
关键类 Lockfile
Lockfile::Open(pid_t * holding_pid)函数详解:
解释和说明:Lockfile::Open(pid_t * holding_pid)会有三种类型的返回值,close-on-exec:具体作用在于当开辟其他进程调用exec()族函数时,在调用exec函数之前为exec族函数释放对应的文件描述符。
(1):返回1说明lockfile可以被打开,这也说明与lockfile关联的进程没有运行,如果关联的进程在运行,lockfile会被进程持有,就不会被打开;
(2):返回0说明检测到lockfile被某个进程持有,那么将持有lockfile的进程ID写入holding_pid返回,持有lockfile的进程ID是在对应进程运行的时候,由Get()函数写入到lockfile中的;
(3):返回负值一共有三种情况,一是打开fname失败,二是获取close-on-exec标识失败,三是设置clsoe-on-exec标识失败。
重要的kill进程的相关函数,简要说明如下:
// kill
//用于杀死指定pid的进程
//return: 0--okay,-1—error
1.int kill(pid_t pid, int sig);
//ink_killall
//杀死程序名称为pname的所有进程
// return: 0--okay,-1—error
2. ink_killall(const char *pname, int sig);
ink_killall调用:
3. ink_killall_get_pidv_xmalloc (pname, &pidv, &pidvcnt);
4. ink_killall_kill_pidv (pidv, pidvcnt, sig);
// ink_killall_get_pidv_xmalloc
//根据程序panme,获取程序运行的进程ID到pidv数组中,以及进程的个数到pidvcnt
//变量中
//return: -1 error (pidv: set to NULL; pidvcnt: set to 0); 0 okay (pidv: ats_malloc'd //pid vector; pidvcnt: number of pid's;if pidvcnt is set to 0, then pidv will //be set to NULL)
3.int ink_killall_get_pidv_xmalloc(const char *pname, pid_t ** pidv, int *pidvcnt);
// ink_killall_kill_pidv (pidv, pidvcnt, sig);
//将pidv中记录的进程ID逐个调用kill( pidv[i],sig)
// return: 0--okay,-1—error
4.int ink_killall_kill_pidv(pid_t * pidv, int pidvcnt,int sig);
ink_killall_kill_pidv调用:
1.kill(pid_t pid, int sig);
// safe_kill
//用于安全的杀死程序名称为pname的所有进程,lockfile_name为进程需要关联的lockfile文件//group表明是否需要杀死pname进程创造的子进程,因为它们在同一个进程组;
//return: void
5. static void safe_kill(const char *lockfile_name, const char *pname, bool group);
static void safe_killd调用:
6. Lockfile::Kill(killsig, coresig, pname);
7. Lockfile::KillGroup(killsig, coresig, pname);
// Lockfile::Kill
//处理好对应的lockfile文件,杀死程序名为pname的所有进程,其中sig一般就是kill信号,//initial_sig默认为0,用于发送给init_pid进程的
//return:void
6. void Lockfile::Kill(int sig, int initial_sig, const char *pname);
Lockfile::Kill调用:
8.LockKill::lockfile_kill_internal(pid, initial_sig, pid, pname, sig);
// Lockfile::KillGroup
//处理好对应的lockfile文件,杀死程序名为pname的进程,以及该进程创建的子进程(当然也包括//子进程创建的线程),sig为kill信号
//信号
//initial_sig同上kill函数
//return :void
7.void Lockfile::KillGroup(int sig, int initial_sig, const char *pname);
Lockfile::KillGroup调用:
8.LockKill::lockfile_kill_internal(pid, initial_sig, pid, pname, sig);
// LockKill::lockfile_kill_internal
//首先杀死init_pid进程,然后杀死程序名称为pname的所有进程
//return :void
8.static void lockfile_kill_internal(pid_t init_pid, int init_sig, pid_t pid, const char *pname, int sig);
lockfile_kill_internal调用:
1.kill(init_pid, init_sig);
3.ink_killall_get_pidv_xmalloc(pname, &pidv, &pidvcnt);
4.ink_killall_kill_pidv(pidv, pidvcnt, sig);
若想了解详细实现细节,请参见源代码.
2. 模拟traffic_cop对traffic_manager和traffic_server的监控
Traffic_cop启动以后进入main函数,main函数会调用一个check函数,在check里面会周期性的调用check_programs()函数来对traffic_manager和traffic_server进行监控。check_programs()函数有些复杂,流程图如下图。
3.模拟测试
根据原理,模仿了traffic_cop、traffic_manager和traffic_server三个进程,其中将traffic_cop实现为守护进程,traffic_manager进程对traffic-server进程的监控类似于traffic_cop对traffic_manager与traffic_server的监控,故不作重复说明。实验中,由于测试traffic_manager与traffic_server进程健康度的函数heartbear_manager()、server_up()与heartbeat_server()函数涉及到端口通信部分内容,由于其不妨碍原理部分的模拟,略写了它们的代码,而是让它们直接返回正常值。(程序运行的时候需要manage_lokfile和server_lockfile文件,读者应自己在可执行文件所在文件夹下加上这两个文件)
程序运行后,敲入命令 ps –axj|grep binary得到图如下:
前四个标识分别是:父进程ID/进程ID/进程组ID/会话ID
图中可以看出它们的正常关系。
当traffic_manager进程异常退出的时候,traffic_cop会重启traffic_manager进程,在日志文件中可以看出这一动作:(日志部分内容如下)
==============traffic_server is running, pid:'5443'!
----------------traffic_manager is running, pid:'5436'!
==============traffic_server is running, pid:'5443'!
---------------traffic_manager has a expcetion and eixt!
Entering check_programs()
traffic_manager not running, making sure traffic_server is dead
Entering safe_kill
Leaving safe_kill
Entering spwan_manager()!
Leaving spwan_manager()!
Leaving check_programs
----------------traffic_manager is running, pid:'5463'!
Entering spwan_server()!
Leaving spwan_server()!
==============traffic_server is running, pid:'5467'!
从日志中可以看出,某个时刻,traffic_manager进程ID是5436,traffic_server进程ID是5443;下一时刻中,traffic_manager进程出现了异常(---------------traffic_manager has a expcetion and eixt!),然后traffic_cop在周期性的check_programs()中发现” traffic_manager not running”,然后它杀死了traffic_server进程(” making sure traffic_server is dead”),然后重新创建了traffic_manager进程(” Entering spwan_manager()!”),traffic_manager进程的ID已经变成了5463,traffic_manager正常运行后,发现traffic_server进程没有运行,随后它调用spwan_server()产生新的traffic_server进程,其ID号变成了5467。说明traffic_cop监控功能正常。
当traffic_server进程异常退出的时候,traffic_manager进程会检测到这一行为,然后重启traffic_server进程,在日志文件中也可以看出这一动作:(日志部分内容如下)
==============traffic_server is running, pid:'7703'!
----------------traffic_manager is running, pid:'7699'!
=================traffic_server has a expcetion and exit!
Entering safe_kill
Leaving safe_kill
--------------Entering spwan_server()!
--------------Leaving spwan_server()!
----------------traffic_manager is running, pid:'7699'!
==============traffic_server is running, pid:'7712'!
从日志上可以看出,某时刻,traffic_manager进程ID为7699,traffic_server进程ID是7703,接下来traffic_server进程出现异常退出,traffic_manager进程则调用spwan_server()重新开启了一个traffic_server进程,ID号为7712,此时traffic_manager进程的ID号仍然是7699,说明traffic_manager进程没有改变。这说明traffic_manager起到了监控traffic_server进程的作用。
4.总结
为什么设计了三个进程来工作,而不是采用两个进程:直接让traffic_manager进程来监管traffic_server进程。由于traffic_manager进程所负担的系统角色说明单独的两个进程是无法满足系统要求的。特别是当traffic_manager进程检测到traffic_server进程失败的时候,它会暂时将请求放入队列中,所以它也需要在端口上暂时监听请求,这样系统就无法保障该进程不会出现异常,这也意味着traffic_manager进程同样也会出现异常。为此系统设计了traffic_cop守护进程来监控,traffic_cop进程的角色就是纯粹的监控另外两个进程,理论上这个守护进程是不会异常结束的,这样的三层设计比两层设计更安全更可靠。当三个进程协同工作的时候,客户对于服务器的异常是透明的(设计上如此,但并非绝对,当traffic_manager与traffic_server同时异常结束的时候,traffic_cop在重启它们的几秒钟内,客户的请求会无法接收,小概率),客户是不会感知到自己的请求会出现问题的,可能会感觉延迟大了一些。从服务器的架构设计上可以看出,服务器的要求是尽可能的稳定安全,对于异常情况的考虑应周全。
源代码:
1.lock_and_kill.h
1 #ifndef LOCK_AND_KILL_H 2 #define LOCK_AND_KILL_H 3 #include <sys/types.h> 4 #include <string.h> 5 #define PATH_NAME_MAX 4096 6 7 /*------------------------------------------------------------------------- 8 ink_killall 9 - Sends signal 'sig' to all processes with the name 'pname' 10 - Returns: -1 error 11 0 okay 12 -------------------------------------------------------------------------*/ 13 int ink_killall(const char *pname, int sig); 14 15 /*------------------------------------------------------------------------- 16 ink_killall_get_pidv_xmalloc 17 - Get all pid's named 'pname' and stores into ats_malloc'd 18 pid_t array, 'pidv' 19 - Returns: -1 error (pidv: set to NULL; pidvcnt: set to 0) 20 0 okay (pidv: ats_malloc'd pid vector; pidvcnt: number of pid's; 21 if pidvcnt is set to 0, then pidv will be set to NULL) 22 -------------------------------------------------------------------------*/ 23 int ink_killall_get_pidv_xmalloc(const char *pname, pid_t ** pidv, int *pidvcnt); 24 25 /*------------------------------------------------------------------------- 26 ink_killall_kill_pidv 27 - Kills all pid's in 'pidv' with signal 'sig' 28 - Returns: -1 error 29 0 okay 30 -------------------------------------------------------------------------*/ 31 int ink_killall_kill_pidv(pid_t * pidv, int pidvcnt, int sig); 32 33 34 35 class Lockfile 36 { 37 public: 38 39 Lockfile(void):fd(0) 40 { 41 fname[0] = '\0'; 42 } 43 44 45 // coverity[uninit_member] 46 Lockfile(const char *filename):fd(0) 47 { 48 strcpy(fname, filename); 49 } 50 51 52 ~Lockfile(void) 53 { 54 } 55 56 void SetLockfileName(const char *filename) 57 { 58 strcpy(fname, filename); 59 } 60 61 const char *GetLockfileName(void) 62 { 63 return fname; 64 } 65 66 // Open() -----非常重要的函数 67 // 68 // Tries to open a lock file, returning: 69 // -errno on error 70 // 0 if someone is holding the lock (with holding_pid set) 71 // 1 if we now have a writable lock file 72 int Open(pid_t * holding_pid); 73 74 // Get() 75 // 76 // Gets write access to a lock file, and if successful, truncates 77 // file, and writes the current process ID. Returns: 78 // -errno on error 79 // 0 if someone is holding the lock (with holding_pid set) 80 // 1 if we now have a writable lock file 81 int Get(pid_t * holding_pid); 82 83 // Close() 84 // 85 // Closes the file handle on the opened Lockfile. 86 void Close(void); 87 88 // Kill() 89 // KillGroup() 90 // 91 // Ensures no one is holding the lock. It tries to open the lock file 92 // and if that does not succeed, it kills the process holding the lock. 93 // If the lock file open succeeds, it closes the lock file releasing 94 // the lock. 95 // 96 // The intial signal can be used to generate a core from the process while 97 // still ensuring it dies. 98 void Kill(int sig, int initial_sig = 0, const char *pname = NULL); 99 void KillGroup(int sig, int initial_sig = 0, const char *pname = NULL);100 101 private:102 char fname[PATH_NAME_MAX];103 int fd;104 };105 106 107 #endif
2.lock_and_kill.cpp
1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <dirent.h> 4 #include<unistd.h> 5 #include<sys/file.h> 6 #include <errno.h> 7 #include <signal.h> 8 9 #include "lock_and_kill.h" 10 11 12 #define PROC_BASE "/proc" 13 #define INITIAL_PIDVSIZE 32 14 #define LOCKFILE_BUF_LEN 16 15 #define LINE_MAX 1024 //may be hava problem with it 16 int 17 ink_killall(const char *pname, int sig) 18 { 19 int err; 20 pid_t *pidv; 21 int pidvcnt; 22 23 if (ink_killall_get_pidv_xmalloc(pname, &pidv, &pidvcnt) < 0) { 24 return -1; 25 } 26 27 if (pidvcnt == 0) { 28 free(pidv); 29 return 0; 30 } 31 32 err = ink_killall_kill_pidv(pidv, pidvcnt, sig); 33 free(pidv); 34 return err; 35 } 36 37 int 38 ink_killall_get_pidv_xmalloc(const char *pname, pid_t ** pidv, int *pidvcnt) 39 { 40 DIR *dir; 41 FILE *fp; 42 struct dirent *de; 43 pid_t pid, self; 44 char buf[LINE_MAX], *p, *comm; 45 int pidvsize = INITIAL_PIDVSIZE; 46 47 if (!pname || !pidv || !pidvcnt) 48 goto l_error; 49 50 self = getpid(); 51 if (!(dir = opendir(PROC_BASE))) 52 goto l_error; 53 54 *pidvcnt = 0; 55 *pidv = (pid_t *)malloc(pidvsize * sizeof(pid_t)); 56 57 while ((de = readdir(dir))) { 58 if (!(pid = (pid_t) atoi(de->d_name)) || pid == self) 59 continue; 60 snprintf(buf, sizeof(buf), PROC_BASE "/%d/stat", pid); 61 if ((fp = fopen(buf, "r"))) { 62 if (fgets(buf, sizeof buf, fp) == 0) 63 goto l_close; 64 if ((p = strchr(buf, '('))) { 65 comm = p + 1; 66 if ((p = strchr(comm, ')'))) 67 *p = '\0'; 68 else 69 goto l_close; 70 if (strcmp(comm, pname) == 0) { 71 if (*pidvcnt >= pidvsize) { 72 pid_t *pidv_realloc; 73 pidvsize *= 2; 74 if (!(pidv_realloc = (pid_t *)realloc(*pidv, pidvsize * sizeof(pid_t)))) { 75 free(*pidv); 76 goto l_error; 77 } else { 78 *pidv = pidv_realloc; 79 } 80 } 81 (*pidv)[*pidvcnt] = pid; 82 (*pidvcnt)++; 83 } 84 } 85 l_close: 86 fclose(fp); 87 } 88 } 89 closedir(dir); 90 91 if (*pidvcnt == 0) { 92 free(*pidv); 93 *pidv = 0; 94 } 95 return 0; 96 l_error: 97 *pidv = NULL; 98 *pidvcnt = 0; 99 return -1;100 }101 102 int103 ink_killall_kill_pidv(pid_t * pidv, int pidvcnt, int sig)104 {105 int err = 0;106 if (!pidv || (pidvcnt <= 0))107 return -1;108 while (pidvcnt > 0) {109 pidvcnt--;110 if (kill(pidv[pidvcnt], sig) < 0)111 err = -1;112 }113 return err;114 }115 116 117 ////////////////////类函数的实现在下面//////////////////////////////////118 ////////////////////////////////////////////////////////////////////////119 int120 Lockfile::Open(pid_t * holding_pid)121 {122 char buf[LOCKFILE_BUF_LEN];123 pid_t val;124 int err;125 *holding_pid = 0;126 127 #define FAIL(x) \128 { \129 if (fd > 0) \130 close (fd); \131 return (x); \132 }133 134 struct flock lock;135 char *t;136 int size;//开始的时候设置成无效的一个值137 138 // Try and open the Lockfile. Create it if it does not already139 // exist.140 do {141 fd = open(fname, O_RDWR | O_CREAT, 0644);142 } while ((fd < 0) && (errno == EINTR));143 144 if (fd < 0)145 return (-errno);146 147 // Lock it. Note that if we can't get the lock EAGAIN will be the148 // error we receive.149 lock.l_type = F_WRLCK;150 lock.l_start = 0;151 lock.l_whence = SEEK_SET;152 lock.l_len = 0;153 154 do {155 err = fcntl(fd, F_SETLK, &lock);156 } while ((err < 0) && (errno == EINTR));157 158 if (err < 0) {159 // We couldn't get the lock. Try and read the process id of the160 // process holding the lock from the lockfile.161 t = buf;162 163 for (size = 15; size > 0;) {164 do {165 err = read(fd, t, size);166 } while ((err < 0) && (errno == EINTR));167 168 if (err < 0)169 FAIL(-errno);170 if (err == 0)171 break;172 173 size -= err;174 t += err;175 }176 *t = '\0';177 178 // coverity[secure_coding]179 if (sscanf(buf, "%d\n", (int*)&val) != 1) {180 *holding_pid = 0;181 } else {182 *holding_pid = val;183 }184 FAIL(0);185 186 }187 // If we did get the lock, then set the close on exec flag so that188 // we don't accidently pass the file descriptor to a child process189 // when we do a fork/exec.190 do {191 err = fcntl(fd, F_GETFD, 0);192 } while ((err < 0) && (errno == EINTR));193 194 if (err < 0)195 FAIL(-errno);196 197 val = err | FD_CLOEXEC;198 199 do {200 err = fcntl(fd, F_SETFD, val);201 } while ((err < 0) && (errno == EINTR));202 203 if (err < 0)204 FAIL(-errno);205 206 // Return the file descriptor of the opened lockfile. When this file207 // descriptor is closed the lock will be released.208 return (1); // success209 #undef FAIL210 }211 212 int213 Lockfile::Get(pid_t * holding_pid)214 {215 char buf[LOCKFILE_BUF_LEN];216 int err;217 *holding_pid = 0;218 219 fd = -1;220 221 // Open the Lockfile and get the lock. If we are successful, the222 // return value will be the file descriptor of the opened Lockfile.223 err = Open(holding_pid);224 if (err != 1)225 return err;226 227 if (fd < 0) {228 return -1;229 }230 231 // Truncate the Lockfile effectively erasing it.232 do {233 err = ftruncate(fd, 0);234 } while ((err < 0) && (errno == EINTR));235 236 if (err < 0) {237 close(fd);238 return (-errno);239 }240 241 // Write our process id to the Lockfile.242 snprintf(buf, sizeof(buf), "%d\n", (int) getpid());243 244 do {245 err = write(fd, buf, strlen(buf));246 } while ((err < 0) && (errno == EINTR));247 248 if (err != (int) strlen(buf)) {249 close(fd);250 return (-errno);251 }252 return (1); // success253 }254 255 void256 Lockfile::Close(void)257 {258 if (fd != -1) {259 close(fd);260 }261 }262 263 //-------------------------------------------------------------------------264 // Lockfile::Kill() and Lockfile::KillAll()265 //266 // Open the lockfile. If we succeed it means there was no process267 // holding the lock. We'll just close the file and release the lock268 // in that case. If we don't succeed in getting the lock, the269 // process id of the process holding the lock is returned. We270 // repeatedly send the KILL signal to that process until doing so271 // fails. That is, until kill says that the process id is no longer272 // valid (we killed the process), or that we don't have permission273 // to send a signal to that process id (the process holding the lock274 // is dead and a new process has replaced it).275 //276 // INKqa11325 (Kevlar: linux machine hosed up if specific threads277 // killed): Unfortunately, it's possible on Linux that the main PID of278 // the process has been successfully killed (and is waiting to be279 // reaped while in a defunct state), while some of the other threads280 // of the process just don't want to go away. Integrate ink_killall281 // into Kill() and KillAll() just to make sure we really kill282 // everything and so that we don't spin hard while trying to kill a283 // defunct process.284 //-------------------------------------------------------------------------285 286 287 static void288 lockfile_kill_internal(pid_t init_pid, int init_sig, pid_t pid, const char *pname, int sig)289 {290 int err;291 292 #if defined(linux)293 294 pid_t *pidv;295 int pidvcnt;296 297 // Need to grab pname's pid vector before we issue any kill signals.298 // Specifically, this prevents the race-condition in which299 // traffic_manager spawns a new traffic_server while we still think300 // we're killall'ing the old traffic_server.301 if (pname) {302 //这函数的功能是什么,将程序名为pname的进程都不给杀死,pidv是pid的数组指针,pidvcnt是进程个数303 ink_killall_get_pidv_xmalloc(pname, &pidv, &pidvcnt);304 }305 306 if (init_sig > 0) {307 kill(init_pid, init_sig);308 // sleep for a bit and give time for the first signal to be309 // delivered310 sleep(1);311 }312 313 do {314 if ((err = kill(pid, sig)) == 0) {315 sleep(1);316 }317 if (pname && (pidvcnt > 0)) {318 ink_killall_kill_pidv(pidv, pidvcnt, sig);319 sleep(1);320 }321 } while ((err == 0) || ((err < 0) && (errno == EINTR)));322 323 free(pidv);324 325 #else326 327 if (init_sig > 0) {328 kill(init_pid, init_sig);329 // sleep for a bit and give time for the first signal to be330 // delivered331 sleep(1);332 }333 334 do {335 err = kill(pid, sig);336 } while ((err == 0) || ((err < 0) && (errno == EINTR)));337 338 #endif // linux check339 340 }341 342 /////////////////////////////////////////////////////////////////343 /////////////////////////////////////////////////////////////////344 void345 Lockfile::Kill(int sig, int initial_sig, const char *pname)346 {347 int err;348 int pid;349 pid_t holding_pid;350 351 err = Open(&holding_pid);352 if (err == 1) // success getting the lock file,说明没有对应的server进程存在353 {354 Close(); //因此不需要处理,关闭就行了355 } else if (err == 0) // someone else has the lock356 {357 pid = holding_pid; //获取持有锁进程的pid358 if (pid != 0) { //当进程pid有效的时候,就去杀死这个进程359 360 lockfile_kill_internal(pid, initial_sig, pid, pname, sig);361 }362 }363 }364 365 366 /////////////////////////////////////////////////////////////////////367 /////////////////////////////////////////////////////////////////////368 //没怎么明白这个函数!!369 void370 Lockfile::KillGroup(int sig, int initial_sig, const char *pname)371 {372 int err;373 pid_t pid;374 pid_t holding_pid;375 376 err = Open(&holding_pid);377 if (err == 1) // success getting the lock file378 {379 Close();380 } else if (err == 0) // someone else has the lock381 {382 do {383 pid = getpgid(holding_pid);//获得进程组识别码384 } while ((pid < 0) && (errno == EINTR));385 386 if ((pid < 0) || (pid == getpid()))387 pid = holding_pid;388 else389 pid = -pid;390 391 if (pid != 0) {392 // We kill the holding_pid instead of the process_group393 // initially since there is no point trying to get core files394 // from a group since the core file of one overwrites the core395 // file of another one396 lockfile_kill_internal(holding_pid, initial_sig, pid, pname, sig);397 }398 }399 }
3.log.h
1 #ifndef LOG_H 2 #define LOG_H 3 #include <stdio.h> 4 5 void write_to_log(char* c){ 6 7 FILE* fd; 8 fd = fopen("log.txt", "ab"); 9 if (fd)10 {11 fputs(c, fd); 12 fclose(fd);13 }14 }15 16 #endif
4.traffic_cop.cpp
1 #include "lock_and_kill.h" 2 #include "log.h" 3 #include <sys/types.h> 4 #include <sys/ipc.h> 5 #include <sys/sem.h> 6 #include <signal.h> 7 #include <sys/param.h> 8 #include <unistd.h> 9 #include <stdlib.h> 10 #include <sys/wait.h> 11 #include <time.h> 12 #include <string.h> 13 #include <stdio.h> 14 #include <sys/stat.h> 15 16 17 #define NOWARN_UNUSED(x) (void)(x) 18 19 static char cop_lockfile[PATH_NAME_MAX]; 20 static char manager_lockfile[PATH_NAME_MAX]; 21 static char server_lockfile[PATH_NAME_MAX]; 22 23 static char manager_binary[PATH_NAME_MAX] = "traffic_manager"; 24 static char server_binary[PATH_NAME_MAX] = "traffic_server"; 25 static int killsig=SIGKILL; 26 static int coresig=0; 27 static int server_not_found = 0; 28 static int server_failures=0; 29 static int manager_failures =0; 30 31 static const int sleep_time = 10; // 10 sec 32 static const int manager_timeout = 3 * 60; // 3 min 33 static const int server_timeout = 3 * 60; // 3 min 34 static const int kill_timeout = 1 * 60; // 1 min 35 36 37 static void sig_alarm_warn(int signum=0) 38 { 39 alarm(kill_timeout); 40 } 41 42 43 static void sig_fatal(int signum) 44 { 45 abort(); 46 } 47 48 49 static void set_alarm_warn() 50 { 51 struct sigaction action; 52 action.sa_handler = sig_alarm_warn; 53 sigemptyset(&action.sa_mask); 54 action.sa_flags = 0; 55 sigaction(SIGALRM, &action, NULL); 56 } 57 58 static void set_alarm_death() 59 { 60 struct sigaction action; 61 action.sa_handler = sig_fatal; 62 sigemptyset(&action.sa_mask); 63 action.sa_flags = 0; 64 sigaction(SIGALRM, &action, NULL); 65 } 66 67 static void sig_child(int signum) 68 { 69 NOWARN_UNUSED(signum); 70 pid_t pid = 0; 71 int status = 0; 72 for (;;) { 73 pid = waitpid(WAIT_ANY, &status, WNOHANG); 74 75 if (pid <= 0) { 76 break; 77 } 78 // TSqa03086 - We can not log the child status signal from 79 // the signal handler since syslog can deadlock. Record 80 // the pid and the status in a global for logging 81 // next time through the event loop. We will occasionally 82 // lose some information if we get two sig childs in rapid 83 // succession 84 // child_pid = pid; 85 //child_status = status; 86 } 87 } 88 89 90 static void init_signals() 91 { 92 struct sigaction action; 93 write_to_log("Entering init_signals()\n"); 94 action.sa_handler = sig_child; 95 sigemptyset(&action.sa_mask); 96 action.sa_flags = 0; 97 sigaction(SIGCHLD, &action, NULL); 98 action.sa_handler = sig_fatal; 99 sigemptyset(&action.sa_mask);100 action.sa_flags = 0;101 write_to_log("leaving init_signals()\n\n");102 }103 104 105 static void safe_kill(const char* lockfile_name,const char * pname,bool group)106 {107 Lockfile lockfile(lockfile_name);108 write_to_log("Entering safe_kill\n");109 set_alarm_warn();110 alarm(kill_timeout);111 112 if (group == true) {113 lockfile.KillGroup(killsig, coresig, pname);114 } else {115 lockfile.Kill(killsig, coresig, pname);116 }117 alarm(0);118 set_alarm_death();119 write_to_log("Leaving safe_kill\n\n");120 121 }122 123 124 //为了简单化,直接返回0125 static int server_up()126 {127 return 1;128 129 }130 131 132 static int heartbeat_manager()133 {134 //safe_kill(manager_lockfile, manager_binary, true);135 return 1;136 }137 138 static int heartbeat_server()139 {140 //safe_kill(server_lockfile, server_binary, false);141 //server_failures = 0;142 return 1;143 }144 145 146 147 static void spawn_manager()148 {149 int err;150 int key;151 err = fork();152 write_to_log("Entering spwan_manager()!\n\n");153 if (err == 0) {154 err = execv(manager_binary, NULL);155 write_to_log("somehow execv failed!\n");156 exit(1);157 } else if (err == -1) {158 write_to_log("unable to fork !\n");159 exit(1);160 } 161 162 manager_failures = 0;163 write_to_log("Leaving spwan_manager()!\n\n");164 }165 166 167 static void init_lockfiles()168 {169 // Layout::relative_to(cop_lockfile, sizeof(cop_lockfile), Layout::get()->runtimedir, COP_LOCK);170 // Layout::relative_to(manager_lockfile, sizeof(manager_lockfile), Layout::get()->runtimedir, MANAGER_LOCK);171 // Layout::relative_to(server_lockfile, sizeof(server_lockfile), Layout::get()->runtimedir, SERVER_LOCK);172 173 write_to_log("Entering init_lockfiles()\n");174 strcpy(cop_lockfile,"cop_lockfile");175 strcpy(manager_lockfile,"manager_lockfile");176 strcpy(server_lockfile,"server_lockfile");177 178 strcpy(manager_binary,"manager_binary");179 strcpy(server_binary,"server_binary");180 181 182 write_to_log("leaving init_lockfiles()\n\n");183 184 //manager_lockfile="manager_lockfile";185 //server_lockfile="server_lockfile";186 //manager_binary="manager_binary";187 //server_binary="server_binary";188 189 }190 191 192 static void check_lockfile()193 {194 195 write_to_log("Entering check_lockfile()\n");196 int err;197 pid_t holding_pid;198 Lockfile cop_lf(cop_lockfile);199 err = cop_lf.Get(&holding_pid);200 201 202 if (err < 0) {203 write_to_log("leaving check_lockfile(),and err<0\n\n");204 exit(1);205 } else if (err == 0) {206 write_to_log("leaving check_lockfile(),and err==0\n\n");207 exit(1);208 }209 write_to_log("leaving check_lockfile()\n\n");210 211 }212 213 214 215 static void check_programs()216 {217 int err;218 pid_t holding_pid;219 220 write_to_log("Entering check_programs()\n");221 printf("Entering check_programs()\n");222 //尝试去获取 manager的lockfile,如果成功,说明没有manager进程在运行223 Lockfile manager_lf(manager_lockfile);224 err = manager_lf.Open(&holding_pid);225 226 //通过检测err的值来判断manager进程的运行情况227 if(err==0){228 write_to_log("in check_programs(),manager_lockfile,err==0\n");229 230 printf("in check_programs(),manager_lockfile,err==0\n");231 232 if(kill(holding_pid,0)==-1){233 234 printf("holding_pid is %d,and invalid\n",holding_pid);235 236 ink_killall(manager_binary, killsig);237 sleep(1); // give signals a chance to be received 238 err = manager_lf.Open(&holding_pid);239 }240 241 }242 243 244 if(err>0){//说明可以获得manager lockfile245 // 'lockfile_open' returns the file descriptor of the opened246 // lockfile. We need to close this before spawning the247 // manager so that the manager can grab the lock. 248 manager_lf.Close(); 249 // Make sure we don't have a stray traffic server running.250 251 write_to_log("traffic_manager not running, making sure traffic_server is dead\n");252 safe_kill(server_lockfile,server_binary,false);253 spawn_manager();254 }255 else256 {257 258 259 260 261 //err<0,Open中返回负值,说明可能是加锁成功,但是设置lockfile的文件信息失败262 // If there is a manager running we want to heartbeat it to263 // make sure it hasn't wedged. If the manager test succeeds we264 // check to see if the server is up. (That is, it hasn't been265 // brought down via the UI). If the manager thinks the server266 // is up, we make sure there is actually a server process267 // running. If there is we test it.268 269 alarm(2*manager_timeout);270 err=heartbeat_manager();//?271 alarm(0);272 273 if(err<0){//???what case274 return ;275 276 }277 278 279 if(server_up()<=0){//???what case280 return;//err>0 ,manager is running ,if server is down we think manager can create a new server ,so return281 }282 283 Lockfile server_lf(server_lockfile);284 err=server_lf.Open(&holding_pid);285 286 if(err==0){287 if(kill(holding_pid,0)==-1){288 ink_killall(server_binary,killsig);289 sleep(1);// give signals a chance to be received290 err=server_lf.Open(&holding_pid);291 }292 }293 294 if(err>0){295 server_lf.Close();296 server_not_found += 1;297 298 if(server_not_found>1){299 300 301 server_not_found=0;302 safe_kill(manager_lockfile, manager_binary, true);303 }304 }else{305 alarm(2 * server_timeout);306 heartbeat_server();//?307 alarm(0);308 309 }310 311 }312 printf("Leaving check_programs\n\n");313 write_to_log("Leaving check_programs\n\n");314 }315 316 317 static void init()318 { 319 write_to_log("Entering init()\n");320 init_signals();321 init_lockfiles();322 check_lockfile();323 write_to_log("Leaving init()\n\n");324 }325 326 static void millisleep(int ms)327 {328 struct timespec ts;329 ts.tv_sec = ms / 1000;330 ts.tv_nsec = (ms - ts.tv_sec * 1000) * 1000 * 1000;331 nanosleep(&ts, NULL);332 }333 334 // Changed function from taking no argument and returning void335 // to taking a void* and returning a void*. The change was made336 // so that we can call ink_thread_create() on this function337 // in the case of running cop as a win32 service.338 339 static void* check(void* arg)340 {341 //bool mgmt_init=false;342 write_to_log("Entering check()\n\n");343 for(;;){344 345 // problems with the ownership of this file as root Make sure it is346 // owned by the admin user347 348 alarm(2 * (sleep_time + manager_timeout * 2 + server_timeout));349 350 check_programs();351 millisleep(sleep_time * 1000);352 }353 write_to_log("Leaveing check()\n\n");354 return arg;355 }356 357 void init_daemon(void) 358 { 359 int i; 360 pid_t pid;361 struct rlimit rl;362 struct sigaction sa;363 //printf("------------------------------\n");364 //umask(0);365 if(getrlimit(RLIMIT_NOFILE,&rl)<0){366 exit(1);367 }368 369 370 if((pid=fork())<0){371 exit(1);//fork失败,退出 372 }else if(pid> 0){ 373 exit(0);//是父进程,结束父进程 374 }375 376 //是第一子进程,后台继续执行 377 setsid();//第一子进程成为新的会话组长和进程组长 378 //并与控制终端分离 379 sa.sa_handler=SIG_IGN;380 sigemptyset(&sa.sa_mask);381 sa.sa_flags=0;382 383 if(sigaction(SIGHUP,&sa,NULL)<0){384 exit(1);385 }386 387 if((pid=fork())<0){388 exit(1);//fork失败,退出 389 }else if(pid> 0){ 390 exit(0);//是父进程,结束父进程 391 }392 //是第二子进程,继续 393 //第二子进程不再是会话组长 394 umask(0);395 if (rl.rlim_max==RLIM_INFINITY){396 rl.rlim_max=1024;397 398 }399 400 for(i=0;i< rl.rlim_max;++i)//关闭打开的文件描述符 401 { 402 close(i);403 } 404 405 //chdir("/tmp");//改变工作目录到/tmp 406 return; 407 } 408 409 410 int main()411 {412 413 init_daemon();//守护进程初始化函数414 write_to_log("Entering main()\n");415 signal(SIGHUP, SIG_IGN);416 signal(SIGTSTP, SIG_IGN);417 signal(SIGTTOU, SIG_IGN);418 signal(SIGTTIN, SIG_IGN);419 //setsid(); 420 init();421 check(NULL);422 write_to_log("leaving main()\n\n");423 return 0;424 }
5.traffic_manager.cpp
1 #include "lock_and_kill.h" 2 #include "log.h" 3 #include <sys/types.h> 4 #include <sys/ipc.h> 5 #include <sys/sem.h> 6 #include <signal.h> 7 #include <unistd.h> 8 #include <stdlib.h> 9 #include <sys/wait.h> 10 #include <time.h> 11 #include <string.h> 12 #include <stdio.h> 13 14 #define NOWARN_UNUSED(x) (void)(x) 15 static char manager_lockfile[4096]="manager_lockfile"; 16 static char server_lockfile[4096]="server_lockfile"; 17 static int server_failures=0; 18 static int killsig=SIGKILL; 19 static int coresig=0; 20 static char server_binary[4096] = "server_binary"; 21 static const int sleep_time = 10; // 10 sec 22 static const int manager_timeout = 3 * 60; // 3 min 23 static const int server_timeout = 3 * 60; // 3 min 24 static const int kill_timeout = 1 * 60; // 1 min 25 26 static void sig_alarm_warn(int signum=0) 27 { 28 alarm(kill_timeout); 29 } 30 31 32 static void sig_fatal(int signum) 33 { 34 abort(); 35 } 36 37 38 static void set_alarm_warn() 39 { 40 struct sigaction action; 41 action.sa_handler = sig_alarm_warn; 42 sigemptyset(&action.sa_mask); 43 action.sa_flags = 0; 44 sigaction(SIGALRM, &action, NULL); 45 } 46 47 static void set_alarm_death() 48 { 49 struct sigaction action; 50 action.sa_handler = sig_fatal; 51 sigemptyset(&action.sa_mask); 52 action.sa_flags = 0; 53 sigaction(SIGALRM, &action, NULL); 54 } 55 56 static void sig_child(int signum) 57 { 58 NOWARN_UNUSED(signum); 59 pid_t pid = 0; 60 int status = 0; 61 for (;;) { 62 pid = waitpid(WAIT_ANY, &status, WNOHANG); 63 64 if (pid <= 0) { 65 break; 66 } 67 // TSqa03086 - We can not log the child status signal from 68 // the signal handler since syslog can deadlock. Record 69 // the pid and the status in a global for logging 70 // next time through the event loop. We will occasionally 71 // lose some information if we get two sig childs in rapid 72 // succession 73 // child_pid = pid; 74 //child_status = status; 75 } 76 } 77 78 static void safe_kill(const char* lockfile_name,const char * pname,bool group) 79 { 80 Lockfile lockfile(lockfile_name); 81 write_to_log("Entering safe_kill\n"); 82 set_alarm_warn(); 83 alarm(kill_timeout); 84 85 if (group == true) { 86 lockfile.KillGroup(killsig, coresig, pname); 87 } else { 88 lockfile.Kill(killsig, coresig, pname); 89 } 90 alarm(0); 91 set_alarm_death(); 92 write_to_log("Leaving safe_kill\n\n"); 93 94 } 95 96 static void spawn_server() 97 { 98 int err; 99 int key;100 write_to_log("--------------Entering spwan_server()!\n\n");101 err = fork();102 if (err == 0) {103 err = execv(server_binary, NULL);104 105 write_to_log("--------------somehow execv failed!\n");106 exit(1);107 } else if (err == -1) {108 write_to_log("--------------unable to fork server !\n");109 exit(1);110 } 111 112 server_failures = 0;113 write_to_log("--------------Leaving spwan_server()!\n\n");114 }115 116 117 void check_server()118 {119 int err;120 pid_t holding_pid;121 Lockfile server_lf(server_lockfile);122 err=server_lf.Get(&holding_pid);123 124 if(err==0){125 if(kill(holding_pid,0)==-1){126 ink_killall(server_binary,killsig);127 sleep(1);128 err=server_lf.Open(&holding_pid);129 }130 131 }132 133 if(err>0){134 server_lf.Close();135 safe_kill(server_lockfile,server_binary,false);136 spawn_server();137 138 }139 140 }141 142 143 144 145 int main()146 {147 pid_t holding_pid=0;148 Lockfile manager_lf(manager_lockfile);149 manager_lf.Get(&holding_pid);150 151 while(1){152 153 char buf[100];154 sprintf(buf,"----------------traffic_manager is running, pid:'%d'!\n",getpid());155 write_to_log(buf);156 157 printf("----------------traffic_manager is running,pidID: %d\n",getpid());158 159 sleep(5);160 int c=rand()%10;161 162 if(c==1){//模拟manager进程出现状况163 write_to_log("----------------traffic_manager has a expcetion and eixt!\n");164 exit(1);165 }else{//对server进程进行检查166 check_server();167 }168 }169 }
6.traffic_server.cpp
1 #include "log.h" 2 #include "lock_and_kill.h" 3 #include <sys/types.h> 4 #include <unistd.h> 5 #include <stdlib.h> 6 7 8 static char server_lockfile[4096]="server_lockfile"; 9 10 int main()11 {12 13 pid_t holding_pid=0;14 Lockfile server_lf(server_lockfile);15 server_lf.Get(&holding_pid);16 17 while(1){18 19 char buf[100];20 sprintf(buf,"==============traffic_server is running, pid:'%d'!\n",getpid());21 write_to_log(buf);22 sleep(5);23 int c=rand()%100;24 25 if(c<30){//模拟server进程出现状况26 write_to_log("=================traffic_server has a expcetion and exit!\n");27 exit(1);28 }29 }30 return 0;31 32 }
以上文档为以前研究时所写,希望能给感兴趣的同学一点帮助,同时也请大家指点。我这里时简要的分析了traffic进程控制的问题,测试中许多是简化的,比如心跳测试之类的,代码中有说明。
- TraTraffic Server 进程模型
- TraTraffic Server 进程模型
- server模型(每个连接一个进程)
- 进程模型
- 进程模型
- 进程模型
- server多线程并发模型和多进程并发模型的选择
- python实现-恒定数量进程池的echo server模型 基于multiprocessing
- 进程模型与线程模型
- nginx进程模型,事件模型
- 操作系统进程模型分析
- Linux 进程内存模型
- android进程模型
- Apache进程池模型
- nginx的进程模型
- chromium学习:进程模型
- nginx的进程模型
- 多进程epoll模型
- 黑马程序员SQL函数
- ssh、scp常见问题
- Activity的生命周期
- C#制作启动窗体
- BellmanFord算法模板
- TraTraffic Server 进程模型
- 如何忘却jQuery,开始使用JavaScript原生API
- POJ 1001 Exponentiation
- 黑马程序员C#中文件操作
- paramiko SSH 交互
- 通过chrome查看一个http请求的发起者。
- JVM垃圾收集器使用调查:CMS最受欢迎
- 2013-11-29
- nginx并发模型与traffic_server并发模型简单比较