多线程爬虫（提升爬虫的速度）

来源：互联网发布：招行信用卡网络盗刷编辑：程序博客网时间：2024/06/06 19:00

第七章：提升爬虫的速度

7.1.1并发和并行
了解并发（concurrency）和并行（parallelism）的概念（操作系统）

7.1.2同步和异步
了解同步了异步的概念（操作系统）

7.2多线程爬虫

GIL（全局资源解释器），python属于
脚本语言，通过解释器运行，区别的编译语言。

爬虫属于，本机和服务器的io操作

7.2.2学习python多线程
（1）函数式：调用_thread模块中的start_new_thread()函数式产生新线程。

（2）类包装式：调用Threading库创建线程，从threading.Thread继承

import _threadimport timedef print_time(threadName,delay):    count = 0    while count <3:        time.sleep(delay)        count +=1        print(threadName,time.ctime())        print("%s: %s" % (threadName, time.ctime(time.time())))try:    _thread.start_new_thread(print_time, ("Thread-1",1))    _thread.start_new_thread(print_time, ("Thread—2",2))except:    print("error")print("Main Finished")while 1:#让主线程一直运行，否则主线程结束，子线程还未执行完   pass``_thread.start_new_thread()函数来产生新线程，语法如下_thread.start_new_thread(function,args[,kwargs])function表示线程函数，上例print_time,args为传递给线程的函数参数，必须是tuple类型，上例（“Thread-1”,1),最后args是可选参数。_thread提供了低级别，原始的线程，它相比于threading模块，功能还是比较有限，threading模块则提供了Thread类来处理线程，方法如下run（）:用以表示线程活动start（）：启动线程活动join（[time]）：等待至线程中止。阻塞调用线程直至线程的join（）方法被调用为止isAlive：返回线程是否是活动的getName：返回线程名setName：设置线程名

import threading
import time

class myThread(threading.Thread):
def init(self,name,delay):
threading.Thread.init(self)
self.name = name
self.delay = delay
def run(self):
print(“Stsrting”+self.name)

    self.print_time(self.name,self.delay)    print("Exiting"+self.name)def print_time(self,threadName,delay):    counter = 0    while counter < 3:        time.sleep(delay)        print(threadName,time.ctime())        counter+=1

threads = []

创建新线程

thread1 = myThread(“Thread-1”,1)
thread2 = myThread(“Thread-2”,2)

开启新线程

thread1.start()
thread2.start()

添加线程列表

threads.append(thread1)
threads.append(thread2)

等待所有线程完成

for t in threads:
t.join()

print(“Exiting Main Thread”)
“`

阅读全文

0 0