'latin-1' codec can't encode character 的解决方案

来源：互联网发布：阿里云域名交易平台编辑：程序博客网时间：2024/05/22 10:30

分析一个字符串，并更新数据库的时候，出现了如下错误：
'latin-1' codec can't encode character u'\u017e' in position 11: ordinal not in range(256)

进行了一些研究发现，原因是，数据库的编码和数据源的编码不一致，并且包含了不能处理的字符。

有两种方法可用，一个是先预先处理一下字符串，二是设置数据库参数

1. 处理字符串

>>> u= u'hello\u2013world'
>>> u.encode('latin-1','replace') # replace it with a question mark
'hello?world'
>>> u.encode('latin-1','ignore') # ignore it
'helloworld'
或者根据需求进行处理
>>> u.replace(u'\u2013','-').encode('latin-1')
'hello-world'
If you aren't required to output Latin-1, then UTF-8 is a common and preferred choice. It is recommended by the W3C and nicely encodes all Unicode code points:
>>> u.encode('utf-8')
'hello\xe2\x80\x93world

2. 设置数据库

db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')

0 0