Stanford NLP Chinese(中文)的使用
来源:互联网 发布:木器漆品牌 知乎 编辑:程序博客网 时间:2024/05/29 19:33
Stanford NLP tools提供了处理中文的三个工具,分别是分词、Parser;具体参考:
http://nlp.stanford.edu/software/parser-faq.shtml#o
1.分词 Chinese segmenter
下载:http://nlp.stanford.edu/software/
Stanford Chinese Word Segmenter A Java implementation of a CRF-based Chinese Word Segmenter
这个包比较大,运行时候需要的内存也多,因而如果用eclipse运行的时候需要修改虚拟内存空间大小:
运行-》自变量-》VM自变量-》-Xmx800m (最大内存空间800m)
demo代码(修改过的,未检验):
Properties props = new Properties();props.setProperty("sighanCorporaDict", "data");// props.setProperty("NormalizationTable", "data/norm.simp.utf8");// props.setProperty("normTableEncoding", "UTF-8");// below is needed because CTBSegDocumentIteratorFactory accesses itprops.setProperty("serDictionary","data/dict-chris6.ser.gz");//props.setProperty("testFile", args[0]);props.setProperty("inputEncoding", "UTF-8");props.setProperty("sighanPostProcessing", "true");CRFClassifier classifier = new CRFClassifier(props);classifier.loadClassifierNoExceptions("data/ctb.gz", props);// flags must be re-set after data is loadedclassifier.flags.setProperties(props);//classifier.writeAnswers(classifier.test(args[0]));//classifier.testAndWriteAnswers(args[0]);String result = classifier.testString("我是中国人!");System.out.println(result);
2. Stanford Parser
可以参考http://nlp.stanford.edu/software/parser-faq.shtml#o
http://blog.csdn.net/leeharry/archive/2008/03/06/2153583.aspx
根据输入的训练库不同,可以处理英文,也可以处理中文。输入是分词好的句子,输出词性、句子的语法树(依赖关系)
英文demo(下载的压缩文件中有):
LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});String[] sent = { "This", "is", "an", "easy", "sentence", "." };Tree parse = (Tree) lp.apply(Arrays.asList(sent));parse.pennPrint();System.out.println();TreebankLanguagePack tlp = new PennTreebankLanguagePack();GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);Collection tdl = gs.typedDependenciesCollapsed();System.out.println(tdl);System.out.println();TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");tp.printTree(parse);中文有些不同:
//LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");LexicalizedParser lp = new LexicalizedParser("xinhuaFactored.ser.gz");//lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});// String[] sent = { "This", "is", "an", "easy", "sentence", "." };String[] sent = { "他", "和", "我", "在", "学校", "里", "常", "打", "桌球", "。" };String sentence = "他和我在学校里常打台球。";Tree parse = (Tree) lp.apply(Arrays.asList(sent));//Tree parse = (Tree) lp.apply(sentence);parse.pennPrint();System.out.println();/*TreebankLanguagePack tlp = new PennTreebankLanguagePack();GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);Collection tdl = gs.typedDependenciesCollapsed();System.out.println(tdl);System.out.println();*///only for English//TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");//chineseTreePrint tp = new TreePrint("wordsAndTags,penn,typedDependenciesCollapsed",new ChineseTreebankLanguagePack());tp.printTree(parse);然而有些时候我们不是光只要打印出来的语法依赖关系,而是希望得到关于语法树(图),则需要采用如下的程序:
String[] sent = { "他", "和", "我", "在", "学校", "里", "常", "打", "桌球", "。" };ParserSentence ps = new ParserSentence();Tree parse = ps.parserSentence(sent);parse.pennPrint();TreebankLanguagePack tlp = new ChineseTreebankLanguagePack();GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);Collection tdl = gs.typedDependenciesCollapsed();System.out.println(tdl);System.out.println();for(int i = 0;i < tdl.size();i ++){//TypedDependency(GrammaticalRelation reln, TreeGraphNode gov, TreeGraphNode dep)TypedDependency td = (TypedDependency)tdl.toArray()[i];System.out.println(td.toString());}//采用GrammaticalStructure的方法getGrammaticalRelation(TreeGraphNode gov, TreeGraphNode dep)可以获得两个词的语法依赖关系
0 0
- Stanford NLP Chinese(中文)的使用
- Stanford Chinese Segmenter 的使用
- 使用Stanford NLP工具实现中文命名实体识别
- 使用Stanford NLP software进行中文文本预处理
- Stanford NLP 安装与初步使用
- Stanford Parser中文句法分析器的使用
- python 调用 Stanford NLP 的问题
- 【NLP】play with stanford nlp
- 干货!详述Python NLTK下如何使用stanford NLP工具包
- Stanford Core NLP
- Stanford Core NLP 安装
- NLP之Stanford Parser
- NLTK使用Stanford parser方法,可类推Stanford其他nlp工具
- stanford nlp库提供的nlp之外的分类、语义图、图最短路径功能
- stanford-postagger的使用
- stanford-segmenter的使用
- stanford-NLP-CLASS1课堂笔记
- Stanford NLP工具--句法分析
- 计算机视觉资源汇集
- 解决SecureCRT连接linux超时后断开[转] ,配色
- window10某个文件夹打不开,崩溃,卡死
- json之FastJson解析
- Web service的common sense
- Stanford NLP Chinese(中文)的使用
- fatal error: mysql.h: No such file or directory
- Android之AsyncTask两种线程池分析和总结
- maven学习-settings配置
- r语言入门常用函数
- ASP.Net学习笔记001--ASP.Net简介1
- Eclipse 编译StanfordNLP
- 使用CTabView实现多视图(一体多面)
- Stanford CoreNLP开源项目的3种编译和运行方式[1]