python-urllib库学习

来源：互联网发布：二战美国驱逐舰数据编辑：程序博客网时间：2024/06/14 12:57

1:urllib.urlopen()

打开一个url的方法，返回一个文件对象，然后可以进行类似文件对象的操作。返回一个file_like对象，

用于read(),readline()，readlines()，fileno()，close()，info()，getcode()和geturl()方法

其中info()是返回响应头信息，geturl()可以返回重定向url

示例代码:

import urllib

response = urllib.urlopen('http://www.baidu.com')

#一般解决中文编码乱码问题
print response.read().decode("utf8","ignore").encode('gbk',"ignore")

print response.info()

print response.geturl()

2.urllib.urlretrieve(url)

urlretrieve方法将url定位到的html文件下载到你本地的硬盘中。如果不指定filename，则会存为临时文件。

urlretrieve()返回一个二元组(filename,mine_hdrs)

示例代码:

import urllib
response = urllib.urlretrieve('http://www.baidu.com','d://demo.html')
print '>>>>>>>>>>>>',type(response)#元组
print '>>>>>>>>>>>>',response[0]#文件路径
print '>>>>>>>>>>>>',response[1]#响应头信息

3.urllib.urlcleanup()

清理urlretrieve()的缓存

4.urllib.quote(string[, safe])

使用%xx 转义替换string 中的特殊字符。字母、数字和'_.-'字符永远不会转义。可选的safe参数指出其它不应该转义的字符 —— 默认值为'/'。

urllib.quote_plus(string[, safe])

像quote()一样，也可以通过加号替换空格，这是在构建查询字符串以进入URL时引用HTML表单值所需的。除了包含在safe中，原始字符串中的加号将被转义。它没有将safe默认为'/'。

urllib.unquote(string)

使用其单字符替换％xx转义。

urllib.unquote_plus(string)

像unquote()一样，也可以根据需要取代HTML表单值的空格替换加号。

since: url只能是ASCII编码

示例代码:

import urllib
print urllib.quote('http://www.baidu.com') #%xx 替代: --> http%3A//www.baidu.com
print urllib.quote_plus('http://www.baidu.com') #%xx 替代: --> http%3A%2F%2Fwww.baidu.com
print urllib.unquote('http%3A//www.baidu.com')

print urllib.unquote_plus('http%3A%2F%2Fwww.baidu.com')

5.urllib.urlencode(query)

将URL中的键值对以连接符&划分

示例代码:

import urllib
paramas = urllib.urlencode({'k':'xiaomin','passwd':'root'})
print paramas
response = urllib.urlopen('http://www.baidu.com',paramas)
print response.geturl()

阅读全文

0 0