python urllib2遇到Content-Encoding=gzip解码为乱码的解决方案

来源：互联网发布：2016年7月淘宝新政策编辑：程序博客网时间：2024/06/14 11:06

用Chrome的开发者工具，查看网页的headers，如果response headers出现Content-Encoding : gzip，则urllib2无法对其内容进行解码。

需要用gzip模块来处理，详细方法如下：

yresponse = urllib2.urlopen(url)rspheaders = yresponse.info()yread = yresponse.read()if ('Content-Encoding' in rspheaders and rspheaders['Content-Encoding'] == 'gzip') or ('content-encoding' in rspheaders and rspheaders['content-encoding'] == 'gzip'):import gzipimport StringIOydata = StringIO.StringIO(yread)ygz = gzip.GzipFile(fileobj = ydata)yread = ygz.read()ygz.close()ystr = yread.decode('utf8', 'ignore').encode('GB2312')else:ystr = yread.decode('utf8', 'ignore').encode('GB2312')

代码中url是需要访问的网址，decode()和encode()根据实际情况确定。

阅读全文

0 0