NLP之Stanford Parser using NLTK

来源:互联网 发布:android软件开发项目 编辑:程序博客网 时间:2024/05/22 07:52

因为官网的使用的很不方便,各个参数没有详细的说明,也查不到很好的资料了。所以决定使用python配合NLTK来获取Constituency Parser和Denpendency Parser。

一、安装python

操作系统win10
jdk(版本1.8.0_151)
anaconda(版本4.4.0),python(版本3.6.1)

二、安装NLTK

pip install nltk

安装完成之后进入python命令中,输入

import nltknltk.download()

如图所示:
这里写图片描述
然后就会弹出一个框,具体我目前也不是很懂,大概就是提供的一些资源包,所以我就全部先download
如图所示:
这里写图片描述
这样就完成了。

三、stanford parser与NLTK

在不设置classpath的情况下,简单实用stanford parser的几个简单的demo

1.Constituency Parser

# -*- coding: utf-8 -*-import osfrom nltk.parse.stanford import StanfordParseros.environ['STANFORD_PARSER'] = './model/stanford-parser.jar'os.environ['STANFORD_MODELS'] = './model/stanford-parser-3.8.0-models.jar'parser = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")sentences = parser.raw_parse("the quick brown fox jumps over the \" lazy \" dog .")# for line in sentences:#     for t in line:#         print(t)# GUIfor line in sentences:    for sentence in line:        sentence.draw()

2.Denpendency Parser

# -*- coding: utf-8 -*-import osfrom nltk.parse.stanford import StanfordDependencyParseros.environ['STANFORD_PARSER'] = './model/stanford-parser.jar'os.environ['STANFORD_MODELS'] = './model/stanford-parser-3.8.0-models.jar'parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")sentences = parser.raw_parse("the quick brown fox jumps over the lazy dog")# 返回的是tree# for line in sentences:#     print(line)res = list(parser.parse("the quick brown fox jumps over the lazy dog .".split()))for row in res[0].triples():    print(row)

这是分割线


最终版的:

# -*- coding: utf-8 -*-import osfrom nltk.parse.stanford import StanfordDependencyParseros.environ['STANFORD_PARSER'] = './model/stanford-parser.jar'os.environ['STANFORD_MODELS'] = './model/stanford-parser-3.8.0-models.jar'parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")fin = open("./data/raw.clean.test", encoding="utf-8")fout = open("./result/test.txt", "w+", encoding="utf-8")i = 0for line in fin.readlines():    if line is None or line == "":        pass    else:        sentences, = parser.parse(line.split("|||")[0].split(" "))        # print(sentences.to_conll(4))        fout.write(sentences.to_conll(4))        fout.write('\n')        fout.flush()    i += 1    print(i)fin.close()fout.close()

最终的样子非常符合我的需求
这里写图片描述

over

阅读全文
0 0
原创粉丝点击