Stanford Parser 使用方法

来源：互联网发布：网络电影《罪》在哪看编辑：程序博客网时间：2024/05/17 08:53

一、stanford parser是什么？

stanford parser是stanford nlp小组提供的一系列工具之一，能够用来完成语法分析任务。支持英文、中文、德文、法文、阿拉伯文等多种语言。

可以从这里（http://nlp.stanford.edu/software/lex-parser.shtml#Download）下载编译好的jar包、源码、javadoc等等。

http://nlp.stanford.edu/software/parser-faq.shtml是FAQ，看一下FAQ基本就能明白很多东西。当然，你得懂英文是吧？哈哈。

二、 stanford parser怎么用？

首先登录 http://nlp.stanford.edu/software/lex-parser.shtml#Download 下载

这里我选择下载 3.31 最新的版本

解压后文件如下

可以看到有两个Demo，这是stanfrord 大学帮助使用者理解的，那么怎么把它们导入到 exclipse 中运行呢？

首先建立文件parser ,然后右击属性，Build path,把两个jar文件加入，就可以直接调用其中的类了。

然后把代码直接贴入

import java.io.IOException;import java.io.StringReader;import java.util.*;import edu.stanford.nlp.ling.CoreLabel;import edu.stanford.nlp.ling.HasWord;import edu.stanford.nlp.ling.Label;import edu.stanford.nlp.ling.Word;import edu.stanford.nlp.process.DocumentPreprocessor;import edu.stanford.nlp.process.Tokenizer;import edu.stanford.nlp.trees.*;import edu.stanford.nlp.parser.lexparser.LexicalizedParser;class ParserDemo2 {  /** This example shows a few more ways of providing input to a parser.   *   *  Usage: ParserDemo2 [grammar [textFile]]   */  public static void main(String[] args) throws IOException {    String grammar = args.length > 0 ? args[0] : "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz";    String[] options = { "-maxLength", "80", "-retainTmpSubcategories" };    LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);    TreebankLanguagePack tlp = lp.getOp().langpack();    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();    Iterable<List<? extends HasWord>> sentences;    if (args.length > 1) {      DocumentPreprocessor dp = new DocumentPreprocessor(args[1]);      List<List<? extends HasWord>> tmp =        new ArrayList<List<? extends HasWord>>();      for (List<HasWord> sentence : dp) {        tmp.add(sentence);      }      sentences = tmp;    } else {      // Showing tokenization and parsing in code a couple of different ways.      String[] sent = { "This", "is", "an", "easy", "sentence", "." };      List<HasWord> sentence = new ArrayList<HasWord>();      for (String word : sent) {        sentence.add(new Word(word));      }      String sent2 = ("It has long been known that the rate of oxidative metabolism (the process that uses oxygen to convert food into energy) in any animal has a profound effect on its living patterns. The high metabolic rate of small animals, for example, gives them sustained power and activity per unit of weight, but at the cost of requiring constant consumption of food and water. Very large animals, with their relatively low metabolic rates, can survive well on a sporadic food supply, but can generate little metabolic energy per gram of body weight. If only oxidative metabolic rate is considered, therefore, one might assume that smaller, more active, animals could prey on larger ones, at least if they attacked in groups. Perhaps they could if it were not for anaerobic glycolysis, the great equalizer.");      // Use the default tokenizer for this TreebankLanguagePack      Tokenizer<? extends HasWord> toke =        tlp.getTokenizerFactory().getTokenizer(new StringReader(sent2));      List<? extends HasWord> sentence2 = toke.tokenize();      List<List<? extends HasWord>> tmp =        new ArrayList<List<? extends HasWord>>();      tmp.add(sentence);      tmp.add(sentence2);      sentences = tmp;    }    for (List<? extends HasWord> sentence : sentences) {      Tree parse = lp.parse(sentence);      parse.pennPrint();            //      //      //      System.out.println();      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);      List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();      System.out.println(tdl);      System.out.println();      System.out.println("The words of the sentence:");      for (Label lab : parse.yield()) {        if (lab instanceof CoreLabel) {          System.out.println(((CoreLabel) lab).toString("{map}"));        } else {          System.out.println(lab);        }      }      System.out.println();      System.out.println(parse.taggedYield());      System.out.println();    }    // This method turns the String into a single sentence using the    // default tokenizer for the TreebankLanguagePack.    String sent3 = "This is one last test!";    lp.parse(sent3).pennPrint();  }  private ParserDemo2() {} // static methods only}

下面是运行结果：

0 0