linux中的进程和线程

来源：互联网发布：意大利世界杯知乎编辑：程序博客网时间：2024/06/08 16:28

以下转自：http://www.zhihu.com/question/19901763

多线程有什么用？

这么解释问题吧：

1。单进程单线程：一个人在一个桌子上吃菜。
2。单进程多线程：多个人在同一个桌子上一起吃菜。
3。多进程单线程：多个人每个人在自己的桌子上吃菜。

多线程的问题是多个人同时吃一道菜的时候容易发生争抢，例如两个人同时夹一个菜，一个人刚伸出筷子，结果伸到的时候已经被夹走菜了。。。此时就必须等一个人夹一口之后，在还给另外一个人夹菜，也就是说资源共享就会发生冲突争抢。

1。对于 Windows 系统来说，【开桌子】的开销很大，因此 Windows 鼓励大家在一个桌子上吃菜。因此 Windows 多线程学习重点是要大量面对资源争抢与同步方面的问题。

2。对于 Linux 系统来说，【开桌子】的开销很小，因此 Linux 鼓励大家尽量每个人都开自己的桌子吃菜。这带来新的问题是：坐在两张不同的桌子上，说话不方便。因此，Linux 下的学习重点大家要学习进程间通讯的方法。

--
补充：有人对这个开桌子的开销很有兴趣。我把这个问题推广说开一下。

开桌子的意思是指创建进程。开销这里主要指的是时间开销。
可以做个实验：创建一个进程，在进程中往内存写若干数据，然后读出该数据，然后退出。此过程重复 1000 次，相当于创建/销毁进程 1000 次。在我机器上的测试结果是：
UbuntuLinux：耗时 0.8 秒
Windows7：耗时 79.8 秒
两者开销大约相差一百倍。

这意味着，在 Windows 中，进程创建的开销不容忽视。换句话说就是，Windows 编程中不建议你创建进程，如果你的程序架构需要大量创建进程，那么最好是切换到 Linux 系统。

大量创建进程的典型例子有两个，一个是 gnu autotools 工具链，用于编译很多开源代码的，他们在 Windows 下编译速度会很慢，因此软件开发人员最好是避免使用 Windows。另一个是服务器，某些服务器框架依靠大量创建进程来干活，甚至是对每个用户请求就创建一个进程，这些服务器在 Windows 下运行的效率就会很差。这"可能"也是放眼全世界范围，Linux 服务器远远多于 Windows 服务器的原因。

--
再次补充：如果你是写服务器端应用的，其实在现在的网络服务模型下，开桌子的开销是可以忽略不计的，因为现在一般流行的是按照 CPU 核心数量开进程或者线程，开完之后在数量上一直保持，进程与线程内部使用协程或者异步通信来处理多个并发连接，因而开进程与开线程的开销可以忽略了。

另外一种新的开销被提上日程：核心切换开销。

现代的体系，一般 CPU 会有多个核心，而多个核心可以同时运行多个不同的线程或者进程。

当每个 CPU 核心运行一个进程的时候，由于每个进程的资源都独立，所以 CPU 核心之间切换的时候无需考虑上下文。

当每个 CPU 核心运行一个线程的时候，由于每个线程需要共享资源，所以这些资源必须从 CPU 的一个核心被复制到另外一个核心，才能继续运算，这占用了额外的开销。换句话说，在 CPU 为多核的情况下，多线程在性能上不如多进程。

因而，当前面向多核的服务器端编程中，需要习惯多进程而非多线程。

======================================================================================================

以下转自：http://stackoverflow.com/questions/11662781/when-is-clone-and-fork-better-than-pthreads

When is clone() and fork better than pthreads?

The strength and weakness of fork (and company) is that they create a new process that's a clone of the existing process.

This is a weakness because, as you pointed out, creating a new process has a fair amount of overhead. It also means communication between the processes has to be done via some "approved" channel (pipes, sockets, files, shared-memory region, etc.)

This is a strength because it provides (much) greater isolation between the parent and the child. If, for example, a child process crashes, you can kill it and start another fairly easily. By contrast, if a child thread dies, killing it is problematic at best -- it's impossible to be certain what resources that thread held exclusively, so you can't clean up after it. Likewise, since all the threads in a process share a common address space, one thread that ran into a problem could overwrite data being used by all the other threads, so just killing that one thread wouldn't necessarily be enough to clean up the mess.

In other words, using threads is a little bit of a gamble. As long as your code is all clean, you can gain some efficiency by using multiple threads in a single process. Using multiple processes adds a bit of overhead, but can make your code quite a bit more robust, because it limits the damage a single problem can cause, and makes it much easy to shut down and replace a process if it does run into a major problem.

As far as concrete examples go, Apache might be a pretty good one. It will use multiple threads per process, but to limit the damage in case of problems (among other things), it limits the number of threads per process, and can/will spawn several separate processes running concurrently as well. On a decent server you might have, for example, 8 processes with 8 threads each. The large number of threads helps it service a large number of clients in a mostly I/O bound task, and breaking it up into processes means if a problem does arise, it doesn't suddenly become completely un-responsive, and can shut down and restart a process without losing a lot.

======================================================================================================

以下转自：http://www.cnblogs.com/peteryj/archive/2008/09/24/1944903.html

首先说明Linux下的进程与线程比较相近。这么说的一个原因是它们都采用相同的数据结构来表示，即task_struct。区别在于一个有独立的用户空间，一个是共享的用户空间（如果完全没有用户空间则是内核线程，不需要）。

　　Linux的用户进程不能直接被创建出来，因为不存在这样的API。它只能从某个进程中复制出来，再通过EXEC这样的API来切换到实际想要运行的程序文件。

　　复制的API包括三种：fork、clone、vfork。

　　这三个API的内部实际都是调用一个内核内部函数do_fork，只是填写的参数不同而已。

　　vfork，其实就是fork的部分过程，用以简化并提高效率。而fork与clone是有区别的。fork是进程资源的完全复制，包括进程的PCB、进程的系统堆栈、进程的用户空间、进程打开的设备等。而在clone中其实只有前两项是被复制了的，后两项都与父进程共享。

　　在四项资源的复制中，用户空间是相对庞大的，如果完全复制则效率会很低。在Linux中采用的是“写时复制”技术，也就是说，fork执行时并不真正复制用户空间的所有页面，而只是复制页面表。这样，无论父进程还是子进程，当发生用户空间的写操作时，都会引发“写复制”操作，从而另行分配一块可用的用户空间，使其完全独立。这是一种提高效率的非常有效的方法。

　　而对于clone来说，它们连这些页面表都是与父进程共享，故而是真正意义上的共享，因此对共享数据的保护必须由上层应用来保证。

======================================================================================================

以下转自：http://yuangeqingtian.blog.51cto.com/6994701/1211289

在pthread_create()向管理线程发送REQ_CREATE请求之后，管理线程即调用pthread_handle_create()创建新线程。分配栈、设置thread属性后，以pthread_start_thread()为函数入口调用__clone()创建并启动新线程。pthread_start_thread()读取自身的进程id号存入线程描述结构中，并根据其中记录的调度方法配置调度。一切准备就绪后，再调用真正的线程执行函数，并在此函数返回后调用pthread_exit()清理现场。

linux下查看线程数的三种方法：

1. cat /proc/pid/status

2. pstree -p pid

3. top -H -p pid

4.ps xH,查看所有存在的线程

5.ps -mp pid

6.ps -eLf |grep

0 0