python多进程学习

来源:互联网 发布:linux epoll wait 编辑:程序博客网 时间:2024/05/27 16:41

当遇到CPU密集型的场景时,我们可以考虑用多进程的方式来解决问题。
比如我自己写了个txt文本文件。里面顺次存储了1-999999的数字,循环写入了3次。那么我查找999999出现的次数,这个场景是计算密集型的,也就是属于cpu密集型的场景,我们可以试一下多进程

#!/usr/bin/python env# -*- coding:utf-8 -*-import timeimport multiprocessingimport osdef get_count_number(file_path):    count_num = 0    print 'the process pid is %s and the parent pid is %s : ' %(os.getpid(), os.getppid())    with open(file_path) as f:        str = f.readlines()        for one_line in str:            if '999999' in one_line:                # print one_line                count_num += 1    print count_numstart_time = time.time()txt_list = ['00001.txt','00002.txt', '00003.txt']for file_path in txt_list:    get_count_number(file_path)# print get_count_number('00001.txt')end_time = time.time()print '##########'print 'the total time to run is: ', end_time - start_timemulti_start_time = time.time()process_list = []for each_file in txt_list:    each_process = multiprocessing.Process(target=get_count_number, args=(each_file,))    process_list.append(each_process)    #for each_p in process_list:    each_p.start()for each_p in process_list:    each_p.join()multi_end_time = time.time()print '##########'print 'multi processing time is: ',multi_end_time - multi_start_time

我们启动了多进程,结果如下:

the process pid is 79873 and the parent pid is 78757 : 3the process pid is 79873 and the parent pid is 78757 : 3the process pid is 79873 and the parent pid is 78757 : 3##########the total time to run is:  1.18405485153the process pid is 79874 and the parent pid is 79873 : the process pid is 79875 and the parent pid is 79873 : the process pid is 79876 and the parent pid is 79873 : 333##########multi processing time is:  0.489647865295

我们可以看到,顺次读取3个文件,耗费的时间是1.184秒。而用多进程的方式,总共用了0.49秒。
当文件比较多时,进程不是并发执行的越多越好,进程并发数量是有一个最优的配置方式的,这个与执行程序的机器配置有关。因此我们就引入了线程池的概念,即同一个时刻,同时有几个进程来并发执行。
在例子中,我读取6个文件,进程池设置为3个

#!/usr/bin/python env# -*- coding:utf-8 -*-#!/usr/bin/python env# -*- coding:utf-8 -*-import timeimport multiprocessingimport osdef get_count_number(file_path):    count_num = 0    print 'the process pid is %s and the parent pid is %s : ' %(os.getpid(), os.getppid())    with open(file_path) as f:        str = f.readlines()        for one_line in str:            if '999999' in one_line:                # print one_line                count_num += 1    print count_numstart_time = time.time()txt_list = ['00001.txt','00002.txt', '00003.txt','00004.txt','00005.txt', '00006.txt']for file_path in txt_list:    get_count_number(file_path)# print get_count_number('00001.txt')end_time = time.time()print '##########'print 'the total time to run is: ', end_time - start_timemulti_start_time = time.time()pool = multiprocessing.Pool(processes=3)for each_file in txt_list:    each_process = pool.apply_async(func=get_count_number, args=(each_file,))    #pool.close()pool.join()multi_end_time = time.time()print '##########'print 'multi processing time is: ',multi_end_time - multi_start_time

执行结果如下:

the process pid is 79882 and the parent pid is 78757 : 3the process pid is 79882 and the parent pid is 78757 : 3the process pid is 79882 and the parent pid is 78757 : 3the process pid is 79882 and the parent pid is 78757 : 3the process pid is 79882 and the parent pid is 78757 : 3the process pid is 79882 and the parent pid is 78757 : 3##########the total time to run is:  2.45317697525the process pid is 79883 and the parent pid is 79882 : the process pid is 79884 and the parent pid is 79882 : the process pid is 79885 and the parent pid is 79882 : 333the process pid is 79885 and the parent pid is 79882 : the process pid is 79884 and the parent pid is 79882 : the process pid is 79883 and the parent pid is 79882 : 333##########multi processing time is:  1.25623202324

可以看到,顺序读取6个文件,耗费的时间是2.45秒,而用进程池的方式并发执行,耗费的时间是1.26秒

当把进程的数量设置为2个时,pool = multiprocessing.Pool(processes=2)
运行结果如下:

the process pid is 79893 and the parent pid is 78757 : 3the process pid is 79893 and the parent pid is 78757 : 3the process pid is 79893 and the parent pid is 78757 : 3the process pid is 79893 and the parent pid is 78757 : 3the process pid is 79893 and the parent pid is 78757 : 3the process pid is 79893 and the parent pid is 78757 : 3##########the total time to run is:  2.41508388519the process pid is 79894 and the parent pid is 79893 : the process pid is 79895 and the parent pid is 79893 : 33the process pid is 79894 and the parent pid is 79893 : the process pid is 79895 and the parent pid is 79893 : 33the process pid is 79895 and the parent pid is 79893 : the process pid is 79894 and the parent pid is 79893 : 33##########multi processing time is:  1.55285310745

可以看到,多进程读取6个文件耗费的时间时1.55秒,比进程池设置为3个时,耗费的时间要长一些,这个可以理解,毕竟3个人同时干活比2个人同时干活速度要快一点,但是,并不是进程并发的数量设置的越大越好,比如读500个文件,设置进程数量为100,不见得比设置为50的读取速度更快。
总结:我们可以得出结论,当遇到CPU密集型(计算密集型)的场景时,可以考虑用多进程的方式执行。

原创粉丝点击