语料处理之全角转半角

来源:互联网 发布:数据集成方法 编辑:程序博客网 时间:2024/06/06 10:06

该脚本的功能是把文本文件中的全角字符转换为半角字符:

# -*- coding: utf-8 -*-  def strQ2B(inputFilePath,outputFilePath):      outputFile = open(outputFilePath,'w')    with open(inputFilePath) as inputFile:        lines = inputFile.readlines()        for line in lines:            ustring = line.decode('utf-8')            rstring = ""              for uchar in ustring:                  inside_code=ord(uchar)                  if inside_code == 12288:                              #全角空格直接转换                                  inside_code = 32                   elif (inside_code >= 65281 and inside_code <= 65374): #全角字符(除空格)根据关系转化                      inside_code -= 65248                   rstring += unichr(inside_code)            outputFile.write(rstring.encode('utf-8'))    outputFile.close()        if __name__ == "__main__":    inputFilePath = "../1.txt"         outputFilePath = "../2.txt"    strQ2B(inputFilePath,outputFilePath)                             
0 0
原创粉丝点击