Python并发之多线程

来源:互联网 发布:linux 修改文件时间戳 编辑:程序博客网 时间:2024/05/27 19:27

多线程的使用

例子

例1

# coding: utf-8import threadingimport urllib2import Queuedef task(q, url):    result = urllib2.urlopen(url)    q.put(result)q = Queue.Queue()urls = ['http://www.baidu.com', 'http://www.qq.com', 'http://www.sina.com']threads = []for url in urls:    t = threading.Thread(target=task, args=(q, url))    threads.append(t)    t.start()for t in threads:    t.join()while not q.empty():    s = q.get()    print s.code, s.url

例2

from multiprocessing.dummy import Pool as ThreadPoolimport urllib2def task(url):    result = urllib2.urlopen(url)    return result.code, result.urlurls = ['http://www.baidu.com', 'http://www.qq.com', 'http://www.sina.com']pool = ThreadPool(4)results = pool.map(task, urls)print results

例2跟例1思路上本质是一致的,只是在充分利用库函数multiprocessing.dummy中pool.map的特性,让多线程代码在写法上更优雅。

pool.map的优势

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.

Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

使用thread的场景

Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn’t use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there’s a wait for some I/O).
Queues are almost invariably the best way to farm out work to threads and/or collect the work’s results, by the way, and they’re intrinsically threadsafe so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.

多线程小结:

因为GIL的限制,CPython在处理I/O等待时,适合使用多线程。如果想要充分利用多核CPU,处理CPU密集型的任务,最好是使用多进程(multiprocessing)。

多进程vs多线程

在操作系统中,进程是分配资源的基本单位,线程是调度的基本单位。因此,多进程消耗的资源更多,线程更加轻量级;创建进程一般比线程慢。多进程拥有不同的存储区间,多线程共享的内存。一个进程的崩溃,不会影响其他进程;一个线程的崩溃,可能破坏其他同属一个进程的工作线程。这也导致,在进程间共享数据和通信变得更加困难;由于多线程共享内存,为了防止多个线程在同一时间写同一段内存,在python中GIL被引入来解决这一问题。

Multiprocessing

Pros

  • Separate memory space
  • Code is usually straightforward
  • Takes advantage of multiple CPUs & cores
  • Avoids GIL limitations for cPython
  • Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it’s more of a communication model for IPC)
  • Child processes are interruptible/killable
  • Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
  • A must with cPython for CPU-bound processing

Cons

  • IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
  • Larger memory footprint

Threading

Pros

  • Lightweight - low memory footprint
  • Shared memory - makes access to state from another context easier
  • Allows you to easily make responsive UIs
  • cPython C extension modules that properly release the GIL will run in parallel
  • Great option for I/O-bound applications

Cons

  • cPython - subject to the GIL. In Python, because of GIL (Global Interpreter Lock) a single python process cannot run threads in parallel (utilize multiple cores).
  • Not interruptible/killable
  • If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
  • Code is usually harder to understand and to get right - the potential for race conditions increases dramatically

参考:

  1. GIL是什么鬼?参考:python最难的问题
  2. multiprocessing-vs-threading-python
0 0
原创粉丝点击