Python3 怎么将Unicode转中文，以及GBK乱码ÖÐ¹úÉÙÊýÃñ×åÌØÉ«´åÕ¯

来源：互联网发布：mynba2k18网络维护中编辑：程序博客网时间：2024/06/06 04:02

原理：

如果***type(text) is bytes***，那么text.decode('unicode_escape')

*decode对应bytes*
如果type(text) is str，
那么text.encode(‘latin1’).decode(‘unicode_escape’)
*encode对应str*

1. 案例：

#coding=utf-8import requests,re,json,tracebackfrom  bs4 import  BeautifulSoupdef qiushibaike():    content = requests.get('http://baike.baidu.com/city/api/citylemmalist?type=0&cityId=360&offset=1&limit=60').content    soup = BeautifulSoup(content, 'html.parser')    print(soup.prettify())  #.decode("unicode_escape")    #目前soup.prettify()为str    new=soup.prettify().encode('latin-1').decode('unicode_escape')    #.dencode('latin-1').encode('latin-1').decode('unicode_escape')    print(new)if __name__=='__main__':    qiushibaike()

2. 结果对比：

这里写图片描述

另外爬取时，网站代码出现GBK无法编译python3,如出现如下：

ÖÐ¹úÉÙÊýÃñ×åÌØÉ«´åÕ¯[6]

示例：

#coding=utf-8import requests#共有6页,首页为空不为6for i in range(6):    if i==0:        url='http://www.tcmap.com.cn/list/zhongguoshaoshuminzutesecunzhai.html'    else:        url='http://www.tcmap.com.cn/list/zhongguoshaoshuminzutesecunzhai'+str(i)+'.html'    response=requests.get(url)    print(type(response))   #如需成功编译，在.TEXT下面增加#号部分     html=response.text   #.encode('latin-1').decode('GBK')    print(html)

这里写图片描述

阅读全文

1 0