Python基础 urllib

来源:互联网 发布:公文的阅知范围体现在 编辑:程序博客网 时间:2024/05/18 01:14

Get 获取网页内容

urllib的request模块可以非常方便地抓取URL内容,也就是发送一个GET请求到指定的页面,然后返回HTTP的响应

示例

#!/usr/bin/env python3# -*- coding: utf-8 -*-# Python基础 urllibfrom urllib import request# urllib提供了一系列用于操作URL的功能with request.urlopen('https://www.baidu.com/') as f:    data = f.read()    print("status", f.status)    print("reason", f.reason)    for k, v in f.getheaders():        print("%s:%s"%(k, v))    print("data", data.decode("utf-8"))

运行结果

D:\PythonProject>python main.pystatus 200reason OKAccept-Ranges:bytesCache-Control:no-cacheContent-Length:227Content-Type:text/htmlDate:Wed, 20 Dec 2017 14:17:53 GMTLast-Modified:Thu, 07 Dec 2017 06:53:00 GMTP3p:CP=" OTI DSP COR IVA OUR IND COM "Pragma:no-cacheServer:BWS/1.1Set-Cookie:BD_NOT_HTTPS=1; path=/; Max-Age=300Set-Cookie:BIDUPSID=679581176346342B5F42D3A649A6B51C; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.comSet-Cookie:PSTM=1513779473; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.comStrict-Transport-Security:max-age=0X-Ua-Compatible:IE=Edge,chrome=1Connection:closedata <html><head>        <script>                location.replace(location.href.replace("https://","http://"));        </script></head><body>        <noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript></body></html>
原创粉丝点击