Python学习笔记55 爬虫(隐藏)

来源:互联网 发布:博弈树算法c源码 编辑:程序博客网 时间:2024/06/05 19:32

1.为了隐藏访问方式,可以通过两种方式:

方法一:直接设置一个字典,作为参数传给request,通过修改Request的headers参数修改head = {}head['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'req = urllib.request.Request(url,data,head)
#方法二:在request生成之后通过add header()方法修改req = urllib.request.Request(url,data)req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36')


2.为了更好的隐藏,可以使用延时或者代理

1.延时访问:

import urllib.requestimport urllib.parseimport jsonimport timewhile True:    content = input ('请输入需要翻译的内容(输入"q!"退出程序):')    if(content == 'q'):        break    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=dict2.top'    data = {}    data['type'] = 'AUTO'    data['i'] = content    data['doctype'] = 'json'    data['xmlVersion ']= '1.8'    data['keyfrom'] = 'fanyi.web'    data['ue'] = 'UTF-8'    data['action'] = 'FY_BY_CLICKBUTTON'    data['typoResult'] = 'true'    data = urllib.parse.urlencode(data).encode('utf-8')    '''    #隐藏是Python程序访问的两种方法    方法一:直接设置一个字典,作为参数传给request,通过修改Request的headers参数修改    head = {}    head['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'    req = urllib.request.Request(url,data,head)    '''    #方法二:在request生成之后通过add header()方法修改    req = urllib.request.Request(url,data)    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36')    response = urllib.request.urlopen(req)    html = response.read().decode('utf-8')    target = json.loads(html)    target = target['translateResult'][0][0]['tgt']    print("翻译结果:%s" % target)    time.sleep(5)
2.代理

import urllib.requestimport randomurl = 'http://www.whatismyip.com.tw'#找个代理Ip的网站 查找一些免费IPiplist =  ['171.13.37.210:808','192.129.229.223:9001','61.237.131.59:80','222.94.144.86:808']proxy_support = urllib.request.ProxyHandler({'http':random.choice(iplist)})opener = urllib.request.build_opener(proxy_support)opener.add_headers = [('User-Agent:','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36')]urllib.request.install_opener(opener)response = urllib.request.urlopen(url)html = response.read().decode('utf-8')print(html)
测试结果:有时可以,有时不行,正常


0 0