py3 爬虫设置user-agent

来源:互联网 发布:手机远程ubuntu 编辑:程序博客网 时间:2024/05/19 05:05

爬网页的时候,明明网址是正确的,但是用python爬网页返回 not fount 404错误

网上查了结果是需要设置 user-agent

# -*- coding:utf-8 -*-import urllib.requestimport re# install proxy# url ="http://www.cnblogs.com/GuoYaxiang/p/6232831.html"url = "http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2016/11.html"req = urllib.request.Request(url,headers={'User-agent': 'Mozilla/5.0'})html = urllib.request.urlopen(req).read()html = html.decode("gbk").replace('\n','').replace('\t','')# print(html)pat = re.findall('citytr(.*?)html',html)print(pat)