Python爬虫基本使用
来源:互联网 发布:苹果乐视视频网络异常 编辑:程序博客网 时间:2024/04/29 03:24
1、引入urllib库。
2、发起请求。
3、读取返回的内容。
4、编码设置。(b'为二进制编码,需要转化为utf-8)
5、打印出来。
import urllib.requestresponse=urllib.request.urlopen("http://www.baidu.com")html=response.read()html=html.decode("utf-8")print(html)
二、下载图片并保存到本地
import urllib.request#****this is the first way***#response = urllib.request.urlopen("https://img6.bdstatic.com/img/image/smallpic/weiju112.jpg")#****this is the second way***req = urllib.request.Request("https://img6.bdstatic.com/img/image/smallpic/weiju112.jpg")response=urllib.request.urlopen(req)cat_img = response.read()with open('aaaabbbbcccc.jpg','wb') as f: f.write(cat_img)3、有道翻译
import urllib.requestimport urllib.parseimport jsoncontent=input("Please input the content that you will translate:")url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link'data={}data['action']='FY_BY_CLICKBUTTON'data['doctype']='json'data['i']=contentdata['keyfrom']='fanyi.web'data['type']='auto'data['typoResult']='true'data['ue']='UTF-8'data['xmlVersion']='1.8'data=urllib.parse.urlencode(data).encode("utf-8") response=urllib.request.urlopen(url,data)html=response.read().decode('utf-8')res=json.loads(html) #res is a directprint("The result:%s" % (res['translateResult'][0][0]['tgt']))4、有道翻译增加头部信息(1)(通过增加header信息参数,创建头部字典)。
import urllib.requestimport urllib.parseimport jsoncontent=input("Please input the content that you will translate:")url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link'head={} # the info of req.header to imitate the Agent just like visiting the website by browserhead['User-Agent']="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0"data={}data['action']='FY_BY_CLICKBUTTON'data['doctype']='json'data['i']=contentdata['keyfrom']='fanyi.web'data['type']='auto'data['typoResult']='true'data['ue']='UTF-8'data['xmlVersion']='1.8'data=urllib.parse.urlencode(data).encode("utf-8") #response=urllib.request.urlopen(url,data)req=urllib.request.Request(url,data,head)response=urllib.request.urlopen(req)html=response.read().decode('utf-8')res=json.loads(html) #res is a directprint("The result:%s" % (res['translateResult'][0][0]['tgt']))
5、有道翻译增加头部信息(2)(通过Request.add_header())。
import urllib.requestimport urllib.parseimport jsoncontent=input("Please input the content that you will translate:")url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link''''head={} # the info of req.header to imitate the Agent just like visiting the website by browserhead['User-Agent']="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0"'''data={}data['action']='FY_BY_CLICKBUTTON'data['doctype']='json'data['i']=contentdata['keyfrom']='fanyi.web'data['type']='auto'data['typoResult']='true'data['ue']='UTF-8'data['xmlVersion']='1.8'data=urllib.parse.urlencode(data).encode("utf-8") #response=urllib.request.urlopen(url,data)req=urllib.request.Request(url,data)req.add_header('User-Agent',"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0")response=urllib.request.urlopen(req)html=response.read().decode('utf-8')res=json.loads(html) #res is a directprint("The result:%s" % (res['translateResult'][0][0]['tgt']))7、使用代理。
1、创建参数字典{‘type’:'proxy ip':'port'}
proxy_support=urllib.request.ProxyHandler({})
2、 定制、创建opener。
opener=urllib.request.build_opener(proxy_support)
3、安装opener
urllibrequestinstall_opener(opener)
4、调用opener。
opener.open(url)
代码如下
import urllib.requestimport randomimport timewhile True: url='http://www.whatismyip.com.tw' #a website that can requery the ip of your device iplist=['171.39.32.171:9999','112.245.170.47:9999','111.76.129.119:808','27.206.143.225:9999','114.138.196.144:9999'] #it shuld include the ip:port #1、创建参数字典{‘type’:'proxy ip':'port'} proxy_support=urllib.request.ProxyHandler({'http':random.choice(iplist)}) #proxy_support=urllib.request.ProxyHandler({'http':'123.163.219.132:81'}) #2、 定制、创建opener。 opener=urllib.request.build_opener(proxy_support) opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0')] #3、安装opener urllib.request.install_opener(opener) res=urllib.request.urlopen(url) html=res.read().decode('utf-8') print(html) time.sleep(5)
0 0
- Python爬虫基本使用
- python基本爬虫实现
- python爬虫基本数据类型
- python 爬虫 基本抓取
- python爬虫基本示例
- Python爬虫入门三之Urllib库的基本使用
- Python爬虫入门(3):Urllib库的基本使用
- Python爬虫入门(3):Urllib库的基本使用
- Python爬虫入门三之Urllib库的基本使用
- Python爬虫入门三之Urllib库的基本使用
- Python爬虫入门一之Urllib库的基本使用
- Python爬虫入门三之Urllib库的基本使用
- python 网络爬虫入门-Urllib库的基本使用
- Python爬虫入门三之Urllib库的基本使用
- python爬虫(一)urllib库基本使用
- python爬虫入门三之Urllib库的基本使用
- Python爬虫入门之Urllib库的基本使用 (三)
- Python爬虫入门三之Urllib库的基本使用
- 理解python中的self
- Java中static、final的理解
- 开源PLM软件Aras详解七 在Aras的Method中如何引用外部DLL
- 《生活多美好》
- va_list原理及用法
- Python爬虫基本使用
- TF3 K-means聚类
- Java-实现文本数据去重
- 上下文切换与原子操作
- [最小割+对偶建图+最短路] BZOJ1001: [BeiJing2006]狼抓兔子
- jsp的include指令和include动作
- 算典03_习题_08_UVA-202
- LeetCode 155.Min Stack
- zynq的HDMI接口设计