Python学习笔记——多线程，多进程

来源：互联网发布：乒乓球胶皮知乎编辑：程序博客网时间：2024/06/08 12:35

import threadingimport timeimport logginglogging.basicConfig(level=logging.DEBUG, format='[%(levelname)s] (%(threadName)-10s) %(message)s',)def worker(num):    logging.debug('Worker input arg is %d ' % num)    time.sleep(10)    returnthreads = []for i in range(100):    t = threading.Thread(target=worker, name='work %d' % i, args=(i,))    threads.append(t)    logging.debug('create thread num %d, name %s' % (i, t.name))    t.start()

几个相关的module

thread：This module provides low-level primitives for working with multiple threads (also calledlight-weight processes ortasks) — multiple threads of control sharing their global data space. For synchronization, simple locks (also calledmutexes orbinary semaphores) are provided

https://docs.python.org/2.7/library/thread.html#module-thread

threading：constructs higher-level threading interfaces on top of the lower levelthread module，提供thread module的高层接口，并包了一些以thread为操作对象的接口

https://docs.python.org/2.7/library/threading.html

multiprocessing：is a package that supports spawning processes using an API similar to thethreading module. Themultiprocessing package offers both local and remote concurrency, effectively side-stepping theGlobal Interpreter Lock by using subprocesses instead of threads. Due to this, themultiprocessing module allows the programmer to fully leverage multiple processors on a given machine

https://docs.python.org/2.7/library/multiprocessing.html#module-multiprocessing

process:In multiprocessing, processes are spawned by creating aProcess object and then calling itsstart() method.Process follows the API ofthreading.Thread. A trivial example of a multiprocess program is

Thread对象在同一个进程中并发地运行，并共享内存。对于I/O受限而不是CPU受限的任务来说，使用线程是实现这种任务缩放的一种简单方法。multiprocessing模块是threading的镜像，只是并不是提供Thread类，而是提供一个Process。每个Process是无共享内存的真正的系统进程，不过multiprocessing提供了一些特性，可以共享数据并在进程间传递消息。很多情况下，从线程转换为进程很简单，只需要修改几个import语句

Thread：

1.This class represents an activity that is run in a separate thread of control

2.There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding therun() method in a subclass.

3.only override the __init__() andrun() methods of this class

4.Once a thread object is created, its activity must be started by calling the thread’sstart() method

5.Once a thread object is created, its activity must be started by calling the thread’sstart() method. This invokes therun() method in a separate thread of control.

6.Once the thread’s activity is started, the thread is considered ‘alive’. It stops being alive when itsrun() method terminates – either normally, or by raising an unhandled exception. Theis_alive() method tests whether the thread is alive

7.Other threads can call a thread’s join() method. This blocks the calling thread until the thread whosejoin() method is called is terminated

join（）方法是被其他线程（一般都是主线程）调用，当被调用之后，调用的线程会一直阻塞，等到被调用的线程结束，如果是主线程调用，则主线程不会在子线程之前退出

8.A thread has a name. The name can be passed to the constructor, and read or changed through thename attribute

9.A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through thedaemon property.

守护线程会一直驻留，直到程序退出（而不是解释器退出？？），而程序退出的标志就是只有守护线程存在，守护线程的标志位继承于创建守护线程的线程的标志位

这个特性也就意味着，如果想让一个守护线程一直“守护”，那么只需要在主线程或者其他非守护线程中，调用守护线程的join((方法，那么主线程就一直“等待”守护线程完成工作，就不会退出，python代码因为有主线程这个非守护线程存在，也就不会退出。直到守护线程退出，主线程也就退出，python程序整个就退出了

10.线程的创建

class threading.Thread(group=None,target=None,name=None,args=(),kwargs={})¶

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when aThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults toNone, meaning nothing is called. // target是被thread的run方法调用的

name is the thread name. By default, a unique name is constructed of the form “Thread-N” whereN is a small decimal number.

args is the argument tuple for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to{}.

If the subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

start（）

Start the thread’s activity.

It must be called at most once per thread object. It arranges for the object’srun() method to be invoked in a separate thread of control.

This method will raise a RuntimeError if called more than once on the same thread object

join([timeout])¶

Wait until the thread terminates. This blocks the calling thread until the thread whosejoin() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.

When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). Asjoin() always returnsNone, you must callisAlive() afterjoin() to decide whether a timeout happened – if the thread is still alive, thejoin() call timed out.

When the timeout argument is not present or None, the operation will block until the thread terminates.

A thread can be join()ed many times.

join() raises aRuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error tojoin() a thread before it has been started and attempts to do so raises the same exception.

daemon¶

A boolean value indicating whether this thread is a daemon thread (True) or not (False). This must be set beforestart() is called, otherwiseRuntimeError is raised. Its initial value is inherited from the creating thread; the main thread is not a daemon thread and therefore all threads created in the main thread default todaemon =False.

The entire Python program exits when no alive non-daemon threads are left.

向进程传递参数——Queue

Queue：Returns a process shared queue implemented using a pipe and a fewlocks/semaphores. When a process first puts an item on the queue a feederthread is started which transfers objects from a buffer into the pipe.

Queue相当于一个安全的容器，可以将需要传递给进程的数据放置到Queue中去然后作为target函数的参数，主进程调用put方法放置参数，传递给进程，进程通过target函数中调用get方法取出传递的参数并使用，Queue遵循FIFO

import multiprocessingclass MyFancyClass(object):    def __init__(self, name):        self.name = name    def do_something(self):        process_name = multiprocessing.current_process().name        print 'Doing something fancy in %s for %s'%(process_name, self.name)def worker(q):    obj = q.get()    obj.do_something()if __name__ == '__main__':    queue = multiprocessing.Queue()    p = multiprocessing.Process(target=worker, args=(queue,))    p2 = multiprocessing.Process(target=worker, args=(queue,))    p.start()    p2.start()    print queue.qsize()    queue.put(MyFancyClass('Facny Dan'))    queue.put(MyFancyClass('Tom'))    print queue.qsize()    #wait for the worker to finish    queue.close()    queue.join_thread()    p.join()

下面是另外一个更加高级的使用Queue的例子，可以向进程传递数据，并且更好的是，它将数据做成了一个统一的执行队列（JoinableQueue），由各进程依次去取，与将数据平均分配给各个进程相比，这样无疑为更好，如果有的进程执行的快，那么它可以更多的取数据（task），加快整体执行时间，整体的执行不会被某个执行较慢的进程拖慢

import multiprocessingimport timeclass Consumer(multiprocessing.Process):    def __init__(self, task_queue, result_queue):        multiprocessing.Process.__init__(self)        self.task_queue = task_queue        self.result_queue = result_queue    #虽然Process在构造的时候没有指定target，但是重写run函数就相当于指定了target，因为Process的run本身也是在调用target    def run(self):        print 'run is called'        process_name = self.name        while True:            #多个进程在依次取出queue中的数据（task），然后处理，queue管理并维护这些数据队列，当一个数据被取出之后，就弹出下一个数据            next_task = self.task_queue.get()            if next_task is None:                #Poison pill means shutdown                print '%s: Exiting' % process_name                #JoinableQueue的独特方法，可以告知queue，数据获取过程结束                self.task_queue.task_done()                break            #task类定义了__str__函数，对象被当作str使用的时候，就调用__str__定义的代码            print '%s: %s'% (process_name, next_task)            #task类定义了__call__函数，对象就可以具有仿函数的特性，这个时候执行的是__call__定义的代码            answer = next_task()            self.task_queue.task_done()            self.result_queue.put(answer)        returnclass Task(object):    def __init__(self, a ,b):        self.a = a        self.b = b    def __call__(self):        time.sleep(0.1)#pretend to take some time to do the work        return '%s * %s = %s' % (self.a, self.b, self.a*self.b)    def __str__(self):        return '%s*%s'%(self.a, self.b)if __name__=='__main__':    #Establish communication queues    tasks = multiprocessing.JoinableQueue()    results = multiprocessing.Queue()    #Start consumers    num_consumers = multiprocessing.cpu_count()*2    print 'creating %d consumers' % num_consumers    consumers = [Consumer(tasks, results) for i in xrange(num_consumers)]    for w in consumers:        w.start()    #Enqueue jobs    num_jobs = 10    #当调用tasks.put之后，Consumer的run函数才会被调用    for i in xrange(num_jobs):        print 'tasks.put'        tasks.put(Task(i, i))    #Add a poison pill for each consumer    for i in xrange(num_consumers):        print 'tasks.put None'        tasks.put(None)    #Wait for all the tasks to finish    #JoinableQueue的另外一个特性方法，它会将代码执行停止在这里，直到所有的进程执行完毕，这样主控进程有了更好的一个控制机制    tasks.join()    #start printing results    while num_jobs:        result = results.get()        print 'Result:', result        num_jobs -= -1

执行结果：

creating 4 consumers
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put
tasks.put None
tasks.put None
tasks.put None
tasks.put None
run is called
Consumer-2: 0*0
run is called
Consumer-1: 1*1
run is called
Consumer-3: 2*2
run is called
Consumer-4: 3*3
Consumer-2: 4*4
Consumer-1: 5*5
Consumer-3: 6*6
Consumer-4: 7*7
Consumer-2: 8*8
Consumer-1: 9*9
Consumer-3: Exiting
Consumer-4: Exiting
Consumer-2: Exiting
Consumer-1: Exiting
Result: 0 * 0 = 0
Result: 1 * 1 = 1
Result: 2 * 2 = 4
Result: 3 * 3 = 9
Result: 4 * 4 = 16
Result: 5 * 5 = 25
Result: 6 * 6 = 36
Result: 7 * 7 = 49
Result: 8 * 8 = 64
Result: 9 * 9 = 81

Event：在进程间同步状态的机制，通过将Evnet作为进程的参数传递给进程，可以实现在一个进程中设置（set）Event，然后在另外进程就可以马上收到状态的变化，用来在进程之间传递状态和信号

import multiprocessingimport timedef wait_for_event(e):    """    Wait for the event to be set before doing anything    """    print 'wait_for_event:starting'    #程序会在这里阻塞，等待event 的set方法被调用    e.wait()    print 'wait_for_event: e.is_set()->',e.is_set()def wait_for_event_timeout(e, t):    """    Wait t seconds and then timeout    """    print 'wait_for_event_timeout: starting'    #与上面的代码不同的地方在于，程序也会在这里阻塞，但是只阻塞2s，之后就继续运行    e.wait(t)    print 'wait_for_event_timeout: e.is_set()->', e.is_set()if __name__=='__main__':    e = multiprocessing.Event()    w1 = multiprocessing.Process(name='block', target=wait_for_event, args=(e,))    w1.start()    w2 = multiprocessing.Process(name='non_block', target=wait_for_event_timeout, args=(e,2))    w2.start()    print 'main: waiting before calling Event.set()'    time.sleep(3)    #当event被set之后，相关的进程立马能得到通知    e.set()    print 'main:event is set'

当进程的执行函数需要访问资源的时候，就

Lock:用于实现多进程之间控制资源（多线程的时候，只有列表和字典是线程安全的，但是多进程的时候，这个是未知的）唯一访问的机制。Lock本身与待访问的资源没有关系，将Lock对象作为进程的参数传递给进程，当有进程执行到需要访问这个全局资源的时候，执行lock的acquire方法，如果能够获取到锁定，则代码就可以继续执行下去，如果没有获取到，代码就会被阻塞在这里，当有多个进程在某个时间点同时执行到各自访问全局资源的代码时候，大家都会去执行acquire方法，但是只有一个进程可以获取到锁，那么只有它的代码可以继续执行，其他进程的代码就会被阻塞那里，等待获取锁，这样，通过这种阻塞进程代码执行的方式，实现了同一个时间点只有一个进程可以访问资源的目的，从而实现了资源访问的安全性

A primitive lock is a synchronization primitive that is not owned by aparticular thread when locked. In Python, it is currently the lowest levelsynchronization primitive available, implemented directly by thethreadextension module.

A primitive lock is in one of two states, “locked” or “unlocked”. It is createdin the unlocked state. It has two basic methods,acquire() andrelease(). When the state is unlocked,acquire() changes the stateto locked and returns immediately. When the state is locked,acquire()blocks until a call torelease() in another thread changes it to unlocked,then theacquire() call resets it to locked and returns. Therelease() method should only be called in the locked state; it changes thestate to unlocked and returns immediately. If an attempt is made to release anunlocked lock, aThreadError will be raised.

When more than one thread is blocked in acquire() waiting for the state toturn to unlocked, only one thread proceeds when arelease() call resetsthe state to unlocked; which one of the waiting threads proceeds is not defined,and may vary across implementations.

All methods are executed atomically

condition:

A condition variable is always associated with some kind of lock; this can be passed in or one will be created by default. (Passing one in is useful when several condition variables must share the same lock.)

进程池：相当于对于Process的进一步的封装，用户只需要建立一个进程池，指定（默认就是CPU数目）进程池中的进程数，再指定执行的task（map、apply），进程池就会自动的调度进程来完成工作。用户不需要显示的创建和管理进程

One can create a pool of processes which will carry out tasks submitted to itwith thePool class.

classmultiprocessing.Pool([processes[,initializer[, initargs[,maxtasksperchild]]]])

A process pool object which controls a pool of worker processes to which jobscan be submitted. It supports asynchronous results with timeouts andcallbacks and has a parallel map implementation.

processes is the number of worker processes to use. If processes isNone then the number returned bycpu_count() is used. Ifinitializer is notNone then each worker process will callinitializer(*initargs) when it starts.

Note that the methods of the pool object should only be called bythe process which created the pool

0 0