python多进程共享内存

来源:互联网 发布:ipad电脑mac地址怎么查 编辑:程序博客网 时间:2024/05/16 10:32

1、问题:

群中有同学贴了如下一段代码,问为何 list最后打印的是空值?

from multiprocessing import Process, Manager

import os

 

manager = Manager()

vip_list = []

#vip_list = manager.list()

 

def testFunc(cc):

    vip_list.append(cc)

    print 'process id:', os.getpid()

 

if __name__ == '__main__':

    threads = []

 

    for ll in range(10):

        t = Process(target=testFunc, args=(ll,))

        t.daemon = True

        threads.append(t)

 

    for i in range(len(threads)):

        threads[i].start()

 

    for j in range(len(threads)):

        threads[j].join()

 

    print "------------------------"

    print 'process id:', os.getpid()

    print vip_list

其实如果你了解 python的多线程模型,GIL 问题,然后了解多线程、多进程原理,上述问题不难回答,不过如果你不知道也没关系,跑一下上面的代码你就知道是什么问题了。

python aa.py

process id632

process id635

process id637

process id633

process id636

process id634

process id639

process id638

process id641

process id640

------------------------

process id619

[]

将第 6行注释开启,你会看到如下结果:

process id32074

process id32073

process id32072

process id32078

process id32076

process id32071

process id32077

process id32079

process id32075

process id32080

------------------------

process id32066

[3217506849]

2、python多进程共享变量的几种方式:

(1)Shared memory:

Data can be stored in a shared memory mapusing Value or Array. For example, the following code

http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes

from multiprocessing import Process, Value, Array

 

def f(n, a):

    n.value = 3.1415927

    for i in range(len(a)):

        a[i] = -a[i]

 

if __name__ == '__main__':

    num = Value('d'0.0)

    arr = Array('i', range(10))

 

    p = Process(target=f, args=(num, arr))

    p.start()

    p.join()

 

    print num.value

    print arr[:]

结果:

3.1415927

[0-1-2-3-4-5-6-7-8-9]

(2)Server process:

A manager object returned by Manager()controls a server process which holds Python objects and allows other processesto manipulate them using proxies.
A manager returned by Manager() willsupport types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore,Condition, Event, Queue, Value and Array.
代码见开头的例子。

http://docs.python.org/2/library/multiprocessing.html#managers

3、多进程的问题远不止这么多:数据的同步

看段简单的代码:一个简单的计数器:

from multiprocessing import Process, Manager

import os

 

manager = Manager()

sum = manager.Value('tmp'0)

 

def testFunc(cc):

    sum.value += cc

 

if __name__ == '__main__':

    threads = []

 

    for ll in range(100):

        t = Process(target=testFunc, args=(1,))

        t.daemon = True

        threads.append(t)

 

    for i in range(len(threads)):

        threads[i].start()

 

    for j in range(len(threads)):

        threads[j].join()

 

    print "------------------------"

    print 'process id:', os.getpid()

    print sum.value

结果:

------------------------

process id: 17378

97

也许你会问:WTF?其实这个问题在多线程时代就存在了,只是在多进程时代又杯具重演了而已:Lock!

from multiprocessing import Process, Manager, Lock

import os

 

lock = Lock()

manager = Manager()

sum = manager.Value('tmp'0)

 

 

def testFunc(cc, lock):

    with lock:

        sum.value += cc

 

 

if __name__ == '__main__':

    threads = []

 

    for ll in range(100):

        t = Process(target=testFunc, args=(1, lock))

        t.daemon = True

        threads.append(t)

 

    for i in range(len(threads)):

        threads[i].start()

 

    for j in range(len(threads)):

        threads[j].join()

 

    print "------------------------"

    print 'process id:', os.getpid()

    print sum.value

这段代码性能如何呢?跑跑看,或者加大循环次数试一下。。。

再来看个多进程共享变量的例子:该脚本可以在集群中批量执行任意命令并返回结果。

#!/usr/bin/env python

# coding=utf-8

import sys

 

reload(sys)

sys.setdefaultencoding('utf-8')

 

import rpyc

from pyUtil import *

from multiprocessing import Pool as ProcessPool

from multiprocessing import Manager

 

hostDict = {

    '192.168.1.10'11111

}

 

manager = Manager()

localResultDict = manager.dict()

 

 

def rpc_client(host_port_cmd):

    host = host_port_cmd[0]

    port = host_port_cmd[1]

    cmd = host_port_cmd[2]

    c = rpyc.connect(host, port)

    result = c.root.exposed_execCmd(cmd)

    localResultDict[host] = result

    c.close()

 

 

def exec_cmd(cmd_str):

    host_port_list = []

    for (host, port) in hostDict.items():

        host_port_list.append((host, port, cmd_str))

 

    pool = ProcessPool(len(hostDict))

    results = pool.map(rpc_client, host_port_list)

    pool.close()

    pool.join()

    for ip in localResultDict.keys():

        print ip + ":\t" + localResultDict[ip]

 

if __name__ == "__main__":

 

    if len(sys.argv) == 2 and sys.argv[1] != "-h":

        print "======================"

        print "    Your command is:\t" + sys.argv[1]

        print "======================"

        cmd_str = sys.argv[1]

    else:

        print "Cmd Error!"

        sys.exit(1)

 

    exec_cmd(cmd_str)

需要注意的是 manager.dict()在遍历时一定要使用 .keys() 方法,否则会抛异常:

Traceback (most recent call last):

  File "test.py", line 83in <module>

    exec_cmd(cmd_str)

  File "test.py", line 57in exec_cmd

    for ip in localResultDict:

  File "<string>", line 2in __getitem__

  File "/opt/soft/python-2.7.10/lib/python2.7/multiprocessing/managers.py", line 774in _callmethod

    raise convert_to_error(kind, result)

KeyError: 0

4、最后的建议:

Note that usuallysharing data between processes may not be the best choice, because of all thesynchronization issues; an approach involving actors exchanging messages isusually seen as a better choice. See also Python documentation:As mentioned above, when doing concurrent programming it is usually best toavoid using shared state as far as possible. This is particularly true whenusing multiple processes. However, if you really do need to use some shareddata then multiprocessing provides a couple of ways of doing so.

 

原创粉丝点击