python学习(三)伪装成浏览器

来源:互联网 发布:英文文献阅读软件 编辑:程序博客网 时间:2024/06/05 14:30

第一种方法比较简便直接, 但是不好扩展功能

import urllib.request url = 'http://www.baidu.com/'req = urllib.request.Request(url, headers = {    'Connection': 'Keep-Alive',    'Accept': 'text/html, application/xhtml+xml, */*',    'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko'})oper = urllib.request.urlopen(req)data = oper.read()print(data.decode())
第二种方法使用了 build_opener 这个方法, 用来自定义 opener, 这种方法的好处是可以方便的拓展功能, 例如下面的代码就拓展了自动处理 Cookies 的功能.

import urllib.requestimport http.cookiejar # head: dict of headerdef makeMyOpener(head = {    'Connection': 'Keep-Alive',    'Accept': 'text/html, application/xhtml+xml, */*',    'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko'}):    cj = http.cookiejar.CookieJar()    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))    header = []    for key, value in head.items():        elem = (key, value)        header.append(elem)    opener.addheaders = header    return opener oper = makeMyOpener()uop = oper.open('http://www.baidu.com/', timeout = 1000)data = uop.read()print(data.decode())

下面一种比较方便,直接copy过来用

headers = ("User-Agent",               "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36")    opener = urllib.request.build_opener()    opener.addheaders = [headers]    urllib.request.install_opener(opener)





原创粉丝点击