【python】Python的string和bytes

来源:互联网 发布:阿里云解析怎么用ddns 编辑:程序博客网 时间:2024/06/05 08:31

python3中不会以任何隐式的方式转换string和bytes。

首先这篇文章不错:

http://www.51testing.com/html/63/524463-817888.html

记住这个图:


其中,绿框中的部分python3是不做区分的,也就是说,对python3来说,unicode就是字符,编码和解码都是unicode和别的编码之间的事。

然后,测试代码:

#encode: unicode --> other codec#decode: other codec --> unicode#unicode is a middle codec which is the default codec in python#使用unicode 编码字符串'哈哈'str_obj = '哈哈'uni_obj = u'哈哈'#unicode:\u54c8\u54c8utf8_obj = uni_obj.encode('UTF-8')#utf8 code:b'\xe5\x93\x88\xe5\x93\x88'gbk_obj  = uni_obj.encode('gbk')#python 2.x 可用'''if isinstance(str_obj, unicode):    print('str_obj是unicode string')if isinstance(uni_obj, unicode):    print('uni_obj是unicode string')'''print("'哈哈'的数据类型是:" + str(type(str_obj)))print("u'哈哈'的数据类型是:" + str(type(uni_obj)))print("encode to utf8 的数据类型是:" + str(type(utf8_obj)))print("encode to gbk  的数据类型是:" + str(type(gbk_obj)))print()#这一句输出的是:哈哈,unicode 编码的字符串print('print unicode as str:'+uni_obj)#print('print unicode of uni_obj:'+bytes('\xe5\x93\x88\xe5\x93\x88'))print()print()#这一句输出的是:b'\xe5\x93\x88\xe5\x93\x88' , 只要不是unicode编码,就直接输出 bytesprint('print utf8 as bytes:'+str(utf8_obj))#str()不会将bytes变为stringprint('print utf8 to str(decoded by utf8):'+str(utf8_obj.decode('utf-8')))print('print utf8 to str(decoded by gbk):'+str(utf8_obj.decode('gbk')))print()print('print gbk as bytes:'+str(gbk_obj))#use utf-8 to decode gbk will cause an error#print('print gbk to str(decoded by utf8):'+str(gbk_obj.decode('utf-8')))print('print gbk to str(decoded by utf8):'+'Error!因为utf8编码有三个字节,而gbk只有两个字节')print('print gbk to str(decoded by gbk):'+str(gbk_obj.decode('gbk')))