爬虫入门1:urllib.request
来源:互联网 发布:淘宝人群标签 编辑:程序博客网 时间:2024/06/07 04:09
The urllib.request
module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.
总计就是说这个库有函数和类来实现打开在基础的,需要验证的,重定向的,缓冲中的URL。
1.
urllib.request.
urlopen
(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)¶ssl.SSLContext.load_verify_locations()
.This function always returns an object which can work as a context manager and has methods such as
geturl()
— return the URL of the resource retrieved, commonly used to determine if a redirect was followedinfo()
— return the meta-information of the page, such as headers, in the form of anemail.message_from_string()
instance (see Quick Reference to HTTP Headers)getcode()
– return the HTTP status code of the response.
urllib.request.
install_opener
(opener)好像没啥用,文档里面也说了可有可无urllib.request.
build_opener
([handler, …])Return an OpenerDirector
instance这个类在下面说
4.
OpenerDirector Objects
OpenerDirector.
add_handler
(handler)¶OpenerDirector.
open
(url, data=None[, timeout])¶
5.
HTTPResponse Objects这个就是上面的那个urlopen返回的对象,主要是看看方法。
HTTPResponse.
read
([amt])Reads and returns the response body, or up to the next amt bytes.
HTTPResponse.
readinto
(b)Reads up to the next len(b) bytes of the response body into the buffer b. Returns the number of bytes read.
New in version 3.3.
HTTPResponse.
getheader
(name, default=None)Return the value of the header name, or default if there is no header matching name. If there is more than one header with the name name, return all of the values joined by ‘, ‘. If ‘default’ is any iterable other than a single string, its elements are similarly returned joined by commas.
HTTPResponse.
getheaders
()Return a list of (header, value) tuples.
HTTPResponse.
fileno
()Return the
fileno
of the underlying socket.
HTTPResponse.
msg
import requestsimport urllib.requestfrom lxml import htmlurl = "http://www.baidu.com"data = urllib.request.urlopen(url).read()data = data.decode('UTF-8')print(data)
简单的读取百度首页的内容
- 爬虫入门1:urllib.request
- 爬虫入门1:urllib.parse
- 爬虫入门:urllib爬虫实例
- python3爬虫攻略(1):urllib.request使用(1)
- 【爬虫】用 urllib.request 解析网页
- python爬虫基础知识(一)--Urllib.request
- Python爬虫入门1之urllib库的使用
- python爬虫入门-urllib的基本用法
- Python爬虫入门_之urllib2&&urllib
- 爬虫入门二(urllib,urllib2)
- Python 3.4 - urllib.request 学习爬虫爬网页(一)
- python3网络爬虫一《使用urllib.request发送请求》
- python3爬虫初探(一)之urllib.request
- python3爬虫攻略(2):urllib.request(2)
- python爬虫之urllib.request和cookie登录CSDN
- urllib.request
- python网络入门:urllib.request模块和urllib.urllib.parse模块
- Python爬虫—1入门_1_python内置urllib库的初级用法
- 通过行为参数传递代码
- 常用正则表达式
- 从零开始写Python爬虫 --- 导言
- Anniversary party POJ
- js实现上传图片并预览
- 爬虫入门1:urllib.request
- 正则表达式
- BZOJ 1034: [ZJOI2008]泡泡堂BNB(贪心)
- String中的intern试题解析
- 从零开始写Python爬虫 --- 1.2 BS4库的安装与使用
- java 23种设计模式 深入理解
- callable接口
- js实现深拷贝
- Mysql中设置小数点用什么数据类型