对于利用urllib2批量爬虫遇到HTTP error的解决办法

来源：互联网发布：网络测试工程师编辑：程序博客网时间：2024/06/06 04:34

在批量爬虫网页内容时，很可能有的中间的页面不存在，或跳转至其他页面。如果是利用某个id进行检索的话，遇到此类问题程序就会中断，如果只遇到一两个页面的话还好说，不过当这种情况多的话，就需要python处理错误的机制。类似于其他语言，用try捕捉错误，再用except处理错误就好。
举个例子

for i in range(1,2910):        try:            count = 6-len(str(i))            flag = 1            temp = ''            for j in range(0,count):                temp = temp + '0'            req = urllib2.urlopen('http://data.eastmoney.com/stockdata/'+temp+str(i)+'.html')            if(req.getcode() == '404'):                continue            buf = req.read()        except urllib2.HTTPError:              blabla......

阅读全文

0 0