Python 字符编码详解

来源：互联网发布：线槽做弯的算法和图片编辑：程序博客网时间：2024/04/29 19:13

一默认编码

Python解释器默认编码：ascii

>>> import sys>>> sys.getdefaultencoding()'ascii'

Python源码文件默认编码：ascii

#test.pyprint "你好"#报错

因为源码中有非ascii字符， Python无法正确解码。需要加上编码声明，告知Python解释器如何解码为str。

#test.py# coding=utf-8print "你好"#正确

二转码

str和unicode都是basestring的子类。

      basestring       /      \       / decode  \   str <-------> unicode        encode

unicode: \uxxxx
str：\x01, utf8 or gbk 跟os有关， unix:utf8，win: gbk

unicode才是真正的字符串，str实际上是字节码(byte数组):

str1 = '这是str'for ch in str1:    print ch# xxxxx(乱码) s t rstr1 = u'这是unicode'for ch in str1:    print ch# 这 是 u n i c  o d e

str(s) == s.encode(‘ascii’)
unicode(s) == s.decode(‘ascii’)

将unicode作为中间编码，可以str转为其他编码。

# coding=utf-8str = '这是utf8'print str # 乱码print str.decode('utf8').encode('gbk')  # 正确

三文件读写

文件读写的目标都是str, 如果读取的是其它编码的文字，则需要 decode 之后再做使用。

with open('test.txt', 'r') as fp:    data = fb.read()    print type(data) # <type 'str'>

对于使用 open 函数打开文件之后的写操作（多字节编码的字符串），则需要将需要写入的字符串按照其编码 encode 为一个 str ，如果直接写入，则会引发如下错误（如果在代码中加入了 encoding 声明，则会按照声明的编码格式 encode 后写入）

with open('test.txt', 'w') as fp:    fp.write(u'测试')

报错: UnicodEncodeError: ‘ascii’ codec can’t encode characters in positiono 0-1…

除此以外，codecs 模块也提供了一个open函数，可以直接指定好编码打开一个文本文件，那么读取到的文件内容则直接是一个 unicode 字符串。对应的指定编码后的写入文件，则可以直接将 unicode 写到文件中。

import codecswith codecs.open('test.txt', 'r', encoding='utf8') as fp:    data = fb.read()    print type(data) # <type 'unicode'>with codecs.open('test.txt', 'w', encoding='utf8') as fp:    fp.write(u'测试')

**python3中str默认为unicode，不存在上述问题

0 0

Python 字符编码详解

一 默认编码

二 转码

三 文件读写

一默认编码

二转码

三文件读写