Python3 解决编码问题: UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position
来源:互联网 发布:人类基因组计划 知乎 编辑:程序博客网 时间:2024/05/17 03:31
原博文链接:http://www.aobosir.com/blog/2016/12/08/python3-UnicodeEncodeError-gbk-codec-can’t-encode-character-xa9/
开发环境
- Python第三方库:lxml、Twisted、pywin32、scrapy
- Python 版本:python-3.5.0-amd64
- PyCharm软件版本:pycharm-professional-2016.1.4
- 电脑系统:Windows 10 64位
如果你还没有搭建好开发环境,请到这篇博客。
当使用Scrapy写爬虫项目的时候,当我们爬取某些中文网站,然后在DOS终端中打印爬取的网页源代码的时候,会出现各式各样的编码错误,今天,我又遇到一种编码错误,下面我将这个错误和对应的解决办法记录下来。
爬取的目标网址:http://blog.csdn.net/github_35160620/article/details/53353672
出现错误的代码:
def next(self, response): body_data = response.body.decode('utf-8', 'ignore') print(body_data) pass
执行:来到对应的爬虫项目路径下,执行:
scrapy crawl 爬虫名字
在出现的调试信息中你可以看到一个编码错误:
print(body_data)UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 6732: illegal multibyte sequence
通过查看,这个u'xa9'
Unicode编码所表示的字符是:©
。
可以解决这个错误的方法:
将上面的代码修改为:
def next(self, response): body_data = response.body.decode('utf-8', 'ignore').replace(u'\xa9', u'') print(body_data) pass
现在运行这个程序scrapy crawl 爬虫名字 --nolog
,上面的编码错误就没有。成功的输出了爬取的网页的源代码。
请访问:http://www.aobosir.com/
0 0
- Python3 解决编码问题: UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position
- Python3 解决编码问题: `UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 10: ille
- Python3 解决编码问题: `UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 10: ille
- python编码问题——解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- UnicodeEncodeError: 'gbk' codec can't encode character u'\xa9' in position 28714: illegal multibyte
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX 标签: pythonco
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 解决python3 UnicodeEncodeError: 'gbk' codec can't encode character '\xXX' in position XX
- 编码问题:UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in position 148:
- CENTOS: 安装TOMCAT
- 图的深度优先搜索DFS和广度优先搜索BFS
- 什么是离散的Hopfield网络?
- 学习笔记---if和switch的使用方法和要点
- 工厂模式
- Python3 解决编码问题: UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position
- CSS中的盒子模型
- [置换快速幂 中国剩余定理] POJ 1282 庆典的日期
- JDK动态代理
- cglib动态代理介绍(一)
- Web前端学习【三】
- Spark累加器(Accumulator)陷阱及解决办法
- 解析xml的4种方法详解
- Learning Python 008 正则表达式-005 compile模板的使用