如何使用java版张华平分词软件

来源:互联网 发布:java字符串替换某一位 编辑:程序博客网 时间:2024/06/05 05:38
package code;import java.io.UnsupportedEncodingException;import utils.SystemParas;import com.sun.jna.Library;import com.sun.jna.Native;public class NlpirTest {// 定义接口CLibrary,继承自com.sun.jna.Librarypublic interface CLibrary extends Library {// 定义并初始化接口的静态变量CLibrary Instance = (CLibrary) Native.loadLibrary("D:\\NLPIR\\bin\\ICTCLAS2013\\x64\\NLPIR", CLibrary.class);public int NLPIR_Init(String sDataPath, int encoding,String sLicenceCode);public String NLPIR_ParagraphProcess(String sSrc, int bPOSTagged);public String NLPIR_GetKeyWords(String sLine, int nMaxKeyLimit,boolean bWeightOut);public String NLPIR_GetFileKeyWords(String sLine, int nMaxKeyLimit,boolean bWeightOut);public int NLPIR_AddUserWord(String sWord);//add by qp 2008.11.10public int NLPIR_DelUsrWord(String sWord);//add by qp 2008.11.10public String NLPIR_GetLastErrorMsg();public void NLPIR_Exit();}public static String transString(String aidString, String ori_encoding,String new_encoding) {try {return new String(aidString.getBytes(ori_encoding), new_encoding);} catch (UnsupportedEncodingException e) {e.printStackTrace();}return null;}public static void main(String[] args) throws Exception {String argu = "D:\\NLPIR";// String system_charset = "GBK";//GBK----0String system_charset = "UTF-8";int charset_type = 1;int init_flag = CLibrary.Instance.NLPIR_Init(argu, charset_type, "0");String nativeBytes = null;if (0 == init_flag) {nativeBytes = CLibrary.Instance.NLPIR_GetLastErrorMsg();System.err.println("初始化失败!fail reason is "+nativeBytes);return;}String sInput = "据悉,质检总局已将最新有关情况再次通报美方,要求美方加强对输华玉米的产地来源、运输及仓储等环节的管控措施,有效避免输华玉米被未经我国农业部安全评估并批准的转基因品系污染。";//String nativeBytes = null;try {nativeBytes = CLibrary.Instance.NLPIR_ParagraphProcess(sInput, 1);System.out.println("分词结果为: " + nativeBytes);CLibrary.Instance.NLPIR_AddUserWord("要求美方加强对输 n");CLibrary.Instance.NLPIR_AddUserWord("华玉米的产地来源 n");nativeBytes = CLibrary.Instance.NLPIR_ParagraphProcess(sInput, 1);System.out.println("增加用户词典后分词结果为: " + nativeBytes);CLibrary.Instance.NLPIR_DelUsrWord("要求美方加强对输");nativeBytes = CLibrary.Instance.NLPIR_ParagraphProcess(sInput, 1);System.out.println("删除用户词典后分词结果为: " + nativeBytes);int nCountKey = 0;String nativeByte = CLibrary.Instance.NLPIR_GetKeyWords(sInput, 10,false);System.out.print("关键词提取结果是:" + nativeByte);nativeByte = CLibrary.Instance.NLPIR_GetFileKeyWords("D:\\NLPIR\\feedback\\huawei\\5341\\5341\\产经广场\\2012\\5\\16766.txt", 10,false);System.out.print("关键词提取结果是:" + nativeByte);CLibrary.Instance.NLPIR_Exit();} catch (Exception ex) {// TODO Auto-generated catch blockex.printStackTrace();}}}

1.首先打开http://ictclas.nlpir.org/downloads下载张华平老师的分词软件,
2.然后打开myeclipse,导入sample文件夹中的JnaTest_NLPIR文件项目导入到eclipse中

3.往项目中导入需要的nlpir.dll文件

4.在项目中创建一个file文件夹,将Data数据集复制粘贴到file文件夹中


5.修改NLPIR.dll所在的路径


将上图选中部分改为"NLPIR"


将上图选中部改为项目中file文件夹的位置(eg:D:\\vc6.0rj6.0.8168.2.1429168101\FenCI\file)



注意:如果licence 过期了要到https://github.com/NLPIR-team/NLPIR上下载新的NLPIR.user替换Data中的NLPIR.user









0 0