Python的requestcontent和text的区别

来源：互联网发布：怎么投诉淘宝客服编辑：程序博客网时间：2024/06/07 04:03

requests的content和text的区别
一般情况下text提取出来的可能乱码，content提取出来的一般不会乱码。

一直在想requests的content和text的区别，从print 结果来看是没有任何区别的

那两者之间有什么不同，看下源码：

@property    def text(self):        """Content of the response, in unicode.        If Response.encoding is None, encoding will be guessed using        ``chardet``.        The encoding of the response content is determined based solely on HTTP        headers, following RFC 2616 to the letter. If you can take advantage of        non-HTTP knowledge to make a better guess at the encoding, you should        set ``r.encoding`` appropriately before accessing this property.        """#完整代码就不贴了。@property    def content(self):        """Content of the response, in bytes."""style=”display: none;” class=”save_code tracking-ad” data-mod=”popu_249”>

resp.text返回的是Unicode型的数据。
使用resp.content返回的是bytes型的数据。
也就是说，如果你想取文本，可以通过r.text。
如果想取图片，文件，则可以通过r.content。

# (例如我们请求一个图片地址并且打开图片的话，就可以使用resp.content：)>>> from PIL import Image>>> from io import StringIO>>> i = Image.open(StringIO(resp.content))

不过下面的人也给出了解决办法：

html = bytes(bytearray(html, encoding='utf-8'))selector = etree.HTML(html)

首先将源代码转化成比特数组，然后再将比特数组转化成一个比特对象。这样就可以绕过这个bug。

然而，又有人认为这不是一个bug, 所以一直没有被修复。这是由于，我获取源代码是使用r.text

html = requests.get('xxxxxx',cookies=cookies).text

而如果使用r.content：

html = requests.get('xxxxxx',cookies=cookies).content

就不会报错。

那r.text与r.content有什么区别呢？分析requests的源代码发现，r.text返回的是Unicode型的数据，而使用r.content返回的是bytes型的数据。也就是说，在使用r.content的时候，他已经只带了

html = bytes(bytearray(html, encoding='utf-8'))

这样一个转化了。

最近CentOS都声明放弃Python2了，编码问题确实浪费了很多时间，等空下来转Python3吧~

0 0