python编码方式

来源：互联网发布：matlab简单编程实例pdf 编辑：程序博客网时间：2024/06/09 16:30

查看编码方式：

import chardetprint chardet.detect(str) #str为string[字节序]

若写入时以mode='a',encoding='utf-16'方式执行，则会在内容写入前添加标志：‘xff/xfe’

若以(mode='wb',encoding='utf-16')方式执行，则不会添加：‘xff/xfe’

若以(mode='a',encoding='utf-16-le')方式执行，则不会添加标志

读出时该部分被解码为'ufeff'

update_txt_encoding = {'confidence': 1.0, 'encoding': 'UTF-16LE'}#write encoding:utf-16update_write_encoding_le = {'confidence': 1.0, 'encoding': 'ascii'}#write encoding:utf-16-leprint "utf-16"+"*"*40utf_16_encoding = u'时间'.encode('utf-16')print [utf_16_encoding]#['\xff\xfe\xf6e\xf4\x95']print [utf_16_encoding.decode('utf-16')]#[u'\u65f6\u95f4']print [utf_16_encoding.decode('utf-16-le')]#[u'\ufeff\u65f6\u95f4']print utf_16_encoding.decode("utf-16") == utf_16_encoding.decode("utf-16-le")#Falseprint "utf-16-le"+"*"*40utf_16_le_encoding = u"时间".encode('utf-16-le')print [utf_16_le_encoding]#['\xf6e\xf4\x95']print [utf_16_le_encoding.decode("utf-16")]#[u'\u65f6\u95f4']print [utf_16_le_encoding.decode("utf-16-le")]#[u'\u65f6\u95f4']print utf_16_le_encoding.decode("utf-16") == utf_16_le_encoding.decode("utf-16-le")#True

1、由update.txt写入update_history.txt中时，遍历出update中所有词并以mode=‘a’,encoding='utf-16'的方式写入，write()写入时，若参数为unicode，则需对参数进行encode操作。而‘utf-16’编码会在内容写入前添加‘xff\xfe’标志

例：[u'时间'.encode('utf-16')]==>['\xff\xfe\xf6e\xf4\x95']

2、由代码生成update.txt文件时，mode='wb',encoding='utf-16',以’wb‘写入时，不会在文件前添加’\xff\xfe‘

问题：怎样使不断添加的文件不会出现‘\xff\fe’?

该问题解决方法：先判断该文件是否不存在，若不存在则使用(mode='wb',encoding=’utf-16‘)，若存在则使用(mode='a',endocing='utf-16-le')。不存在时若用(mode='a',endocing='utf-16-le')方式，会因为ascii而产生乱码

补充：文件读写操作

内置的open()方法打开文件时，read()读取的是str，读取后需要使用正确的编码格式进行decode()。write()写入时，如果参数是unicode，则需要使用你希望写入的编码进行encode()，如果是其他编码格式的str，则需要先用该str的编码进行decode()，转成unicode后再使用写入的编码进行encode()。如果直接将unicode作为参数传入write()方法，Python将先使用源代码文件声明的字符编码进行编码然后写入。

0 0