android 汉字转拼音带多音字识别

来源:互联网 发布:淘宝怎样刷销量信誉 编辑:程序博客网 时间:2024/03/29 14:14

android 汉字转拼音带多音字识别

  • 问题来源
    在做地名按首字母排序的时候出现了这样一个bug。长沙会被翻译拼音成zhangsha,重庆会被翻译拼音成zhong qing。于是排序出了问题。
  • 汉字转拼音库和多音字识别库
    1.多音字对应的词汇库
    2.文字的二进制对应的拼音库
  • 关键代码
    1.我在这里首先将要转化的文字转化成对应的”gb2312”编码。汉字转化成二进制编码一般占两个字节,如果一个字节返回字符,如果是两个字节算一下偏移量。代码如下
    /** * 汉字转成ASCII码 * * @param chs * @return */        private int getChsAscii(String chs) {            int asc = 0;            try {                byte[] bytes = chs.getBytes("gb2312");                if (bytes == null || bytes.length > 2 || bytes.length <= 0) {                    throw new RuntimeException("illegal resource string");                }                if (bytes.length == 1) {                    asc = bytes[0];                }                if (bytes.length == 2) {                    int hightByte = 256 + bytes[0];                    int lowByte = 256 + bytes[1];                    asc = (256 * hightByte + lowByte) - 256 * 256;                }            } catch (Exception e) {                System.out.println("ERROR:ChineseSpelling.class-getChsAscii(String chs)" + e);            }            return asc;        }
2.将单个汉字获取的拼音再和多音字库的hashMap进行比较,代码如下:
public String getSellingWithPolyphone(String chs){            if(polyphoneMap != null && polyphoneMap.isEmpty()){                polyphoneMap = initDictionary();            }            String key, value,  resultPy = null;            buffer = new StringBuilder();            for (int i = 0; i < chs.length(); i++) {                key = chs.substring(i, i + 1);                if (key.getBytes().length >= 2) {                    value = (String) convert(key);                    if (value == null) {                        value = "unknown";                    }                } else {                    value = key;                }                resultPy = value;                String left = null;                if(i>=1 && i+1 <= chs.length()){                    left = chs.substring(i-1,i+1);                    if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(left)){                        resultPy = value;                    }                }//              if(chs.contains("重庆")){                    String right = null;    //向右多取一个字,例如 [长]沙                    if(i<=chs.length()-2){                        right = chs.substring(i,i+2);                        if(polyphoneMap.containsKey(right)){                            resultPy = polyphoneMap.get(right);                        }                    }//              }                String middle = null;   //左右各多取一个字,例如 龙[爪]槐                if(i>=1 && i+2<=chs.length()){                    middle = chs.substring(i-1,i+2);                    if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(middle)){                        resultPy = value;                    }                }                String left3 = null;    //向左多取2个字,如 芈月[传],列车长                if(i>=2 && i+1<=chs.length()){                    left3 = chs.substring(i-2,i+1);                    if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(left3)){                        resultPy = value;                    }                }                String right3 = null;   //向右多取2个字,如 [长]孙无忌                if(i<=chs.length()-3){                    right3 = chs.substring(i,i+3);                    if(polyphoneMap.containsKey(value) && polyphoneMap.get(value).contains(right3)){                        resultPy = value;                    }                }                buffer.append(resultPy);            }            return buffer.toString();        }

3.将asserts文件内容解析生成HashMap列表.

    public HashMap<String, String> initDictionary(){        String fileName = "py4j.dic";        InputStreamReader inputReader = null;        BufferedReader bufferedReader = null;        HashMap<String, String> polyphoneMap = new HashMap<String, String>();        try{            inputReader = new InputStreamReader(MyApplication.mContext.getResources().getAssets().open(fileName),"UTF-8");            bufferedReader = new BufferedReader(inputReader);            String line = null;            while((line = bufferedReader.readLine()) != null){                String[] arr = line.split(PINYIN_SEPARATOR);                if(isNotEmpty(arr[1])){                    String[] dyzs = arr[1].split(WORD_SEPARATOR);                    for(String dyz: dyzs){                        if(isNotEmpty(dyz)){                            polyphoneMap.put(dyz.trim(),arr[0]);                        }                    }                }            }        }catch(Exception e){            e.printStackTrace();        }finally{            if(inputReader != null){                try {                    inputReader.close();                } catch (IOException e) {                    // TODO Auto-generated catch block                    e.printStackTrace();                }            }            if(bufferedReader != null){                try {                    bufferedReader.close();                } catch (IOException e) {                    // TODO Auto-generated catch block                    e.printStackTrace();                }            }        }        return polyphoneMap;    }
  • github源码下载
    https://github.com/loveburce/ChinesePolyphone.git

0 0