Ubuntu16.04配置使用deepnlp
来源:互联网 发布:ubuntu怎么注销用户 编辑:程序博客网 时间:2024/06/05 03:47
主要参考的是deep-nlp的readme文件。
DeepNLP包括以下几个模块
NLP Pipeline Modules:
- Word Segmentation/Tokenization
- Part-of-speech (POS)
- Named-entity-recognition(NER)
- textsum: automatic summarization Seq2Seq-Attention models
- textrank: extract the most important sentences
- textcnn: document classification
- Web API: Free Tensorflow empowered web API
- Planed: Parsing, Automatic Summarization
Algorithm(Closely following the state-of-Art)
- Word Segmentation: Linear Chain CRF(conditional-random-field), based on python CRF++ module
- POS: LSTM/BI-LSTM network, based on Tensorflow
- NER: LSTM/BI-LSTM/LSTM-CRF network, based on Tensorflow
- Textsum: Seq2Seq with attention mechanism
- Texncnn: CNN
- Pre-trained Model
- Chinese: Segmentation, POS, NER (1998 china daily corpus)
- English: POS (brown corpus)
- For your Specific Language, you can easily use the script to train model with the corpus of your language choice.
安装
模型需要使用1.0版本的tensorflow。使用如下命令安装:
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp35-cp35m-linux_x86_64.whlsudo pip install --upgrade $TF_BINARY_URL
模型不能使用python3。
使用如下命令安装
sudo pip install deepnlp
使用教程
下载预训练模型
使用pip命令安装的deepnlp并没有下载模型文件,所以需要另外下载,在python3使用如下命令:
import deepnlp# Download all the modulesdeepnlp.download()# Download only specific moduledeepnlp.download('segment')deepnlp.download('pos')deepnlp.download('ner')deepnlp.download('textsum')
分词
运行如下python程序
#coding=utf-8from __future__ import unicode_literalsfrom deepnlp import segmentertext = "我刚刚在浙江卫视看了电视剧老九门,觉得陈伟霆很帅"segList = segmenter.seg(text)text_seg = " ".join(segList)print (text.encode('utf-8'))print (text_seg.encode('utf-8'))
提示出现如下错误
Traceback (most recent call last): File "test_segment.py", line 4, in <module> from deepnlp import segmenter File "/usr/local/lib/python2.7/dist-packages/deepnlp/segmenter.py", line 6, in <module> import CRFPPImportError: No module named CRFPP
分词功能依赖于CRF++(>=0.54)。从网站下载crf0.58,解压,运行如下命令:
./configuremakesudo make install
然后进入python
文件夹中,运行如下命令:
python setup.py buildsupython setup.py install
安装完成后,运行如下python程序
#coding=utf-8from __future__ import unicode_literalsfrom deepnlp import segmentertext = "我刚刚在浙江卫视看了电视剧老九门,觉得陈伟霆很帅"segList = segmenter.seg(text)text_seg = " ".join(segList)print (text.encode('utf-8'))print (text_seg.encode('utf-8'))
出现如下错误:
import CRFPP File "/usr/lib/python2.7/dist-packages/bpython/curtsiesfrontend/repl.py", line 257, in load_module module = pkgutil.ImpLoader.load_module(self, name) File "/usr/lib/python2.7/pkgutil.py", line 246, in load_module mod = imp.load_module(fullname, self.file, self.filename, self.etc) File "/usr/local/lib/python2.7/dist-packages/CRFPP.py", line 26, in <module> _CRFPP = swig_import_helper() File "/usr/local/lib/python2.7/dist-packages/CRFPP.py", line 22, in swig_import_helper _mod = imp.load_module('_CRFPP', fp, pathname, description)ImportError: libcrfpp.so.0: 无法打开共享对象文件: 没有那个文件或目录
这是因为没有建立正确的链接,使用如下命令解决:
sudo ln -s /usr/local/lib/libcrfpp.so.* /usr/lib/
词性标注
运行如下程序:
#coding:utf-8from __future__ import unicode_literals # compatible with python3 unicodefrom deepnlp import segmenterfrom deepnlp import pos_taggertagger = pos_tagger.load_model(lang = 'zh')#Segmentationtext = "我爱吃北京烤鸭" # unicode coding, py2 and py3 compatiblewords = segmenter.seg(text)print(" ".join(words).encode('utf-8'))#POS Taggingtagging = tagger.predict(words)for (w,t) in tagging: str = w + "/" + t print(str.encode('utf-8'))#Results#我/r#爱/v#吃/v#北京/ns#烤鸭/n
#coding:utf-8from __future__ import unicode_literalsimport deepnlpdeepnlp.download('pos') # download the POS pretrained models from github if installed from pipfrom deepnlp import pos_taggertagger = pos_tagger.load_model(lang = 'en') # Loading English model, lang code 'en'#Segmentationtext = "I want to see a funny movie"words = text.split(" ")print (" ".join(words).encode('utf-8'))#POS Taggingtagging = tagger.predict(words)for (w,t) in tagging: str = w + "/" + t print (str.encode('utf-8'))#Results#I/nn#want/vb#to/to#see/vb#a/at#funny/jj#movie/nn
命名实体识别
运行如下程序:
#coding:utf-8from __future__ import unicode_literals # compatible with python3 unicodeimport deepnlpdeepnlp.download('ner') # download the NER pretrained models from github if installed from pipfrom deepnlp import segmenterfrom deepnlp import ner_taggertagger = ner_tagger.load_model(lang = 'zh')#Segmentationtext = "我爱吃北京烤鸭"words = segmenter.seg(text)print (" ".join(words).encode('utf-8'))#NER taggingtagging = tagger.predict(words)for (w,t) in tagging: str = w + "/" + t print (str.encode('utf-8'))#Results#我/nt#爱/nt#吃/nt#北京/p#烤鸭/nt
Pipline
运行如下程序:
#coding:utf-8from __future__ import unicode_literals # compatible with python3 unicodeimport sys,osimport codecsimport deepnlpdeepnlp.download('segment') # download all the required pretrained models from github if installed from pipdeepnlp.download('pos') deepnlp.download('ner')from deepnlp import pipelinep = pipeline.load_model('zh')# concatenate tuples into one string "w1/t1 w2/t2 ..."def _concat_tuples(tagging): TOKEN_BLANK = " " wl = [] # wordlist for (x, y) in tagging: wl.append(x + "/" + y) # unicode concat_str = TOKEN_BLANK.join(wl) return concat_str# input fileBASE_DIR = os.path.dirname(os.path.abspath(__file__))docs = []file = codecs.open(os.path.join(BASE_DIR, 'docs_test.txt'), 'r', encoding='utf-8')for line in file: line = line.replace("\n", "").replace("\r", "") docs.append(line)# output filefileOut = codecs.open(os.path.join(BASE_DIR, 'pipeline_test_results.txt'), 'w', encoding='utf-8')# analyze function# @return: list of 3 elements [seg, pos, ner]text = docs[0]res = p.analyze(text)words = p.segment(text)pos_tagging = p.tag_pos(words)ner_tagging = p.tag_ner(words)# print pipeline.analyze() resultsfileOut.writelines("pipeline.analyze results:" + "\n")fileOut.writelines(res[0] + "\n")fileOut.writelines(res[1] + "\n")fileOut.writelines(res[2] + "\n")print (res[0].encode('utf-8'))print (res[1].encode('utf-8'))print (res[2].encode('utf-8'))# print modules resultsfileOut.writelines("modules results:" + "\n")fileOut.writelines(" ".join(words) + "\n")fileOut.writelines(_concat_tuples(pos_tagging) + "\n")fileOut.writelines(_concat_tuples(ner_tagging) + "\n")fileOut.close
自动摘要
参考https://github.com/rockingdingo/deepnlp/tree/master/deepnlp/textsum或者textsum文件夹下的readme。
交互式预测
cd ./ckptcat headline_large.ckpt-48000.* > headline_large.ckpt-48000.data-00000-of-00001.tar.gztar xzvf headline_large.ckpt-48000.data-00000-of-00001.tar.gzsudo mkdir /mnt/python/pypi/deepnlp/deepnlp/textsum/ckptsudo cp * /mnt/python/pypi/deepnlp/deepnlp/textsum/ckptcd ..python predict.py
然后交互式输入中文分好词的新闻正文语料,词之间空格分割,结果返回自动生成的新闻标题。
预测和评估ROUGE分
python predict.py news/test/content-test.txt news/test/title-test.txt news/test/summary.txt
阅读全文
1 0
- Ubuntu16.04配置使用deepnlp
- ubuntu16.04安装、配置、使用tftp
- ubuntu16.04输入法使用,fcitx输入法配置
- ubuntu16.04使用EAP配置802.1x
- ubuntu16.04初次配置
- Ubuntu16.04配置Mentohust
- Ubuntu16.04配置小结
- ubuntu16.04 配置
- ubuntu16.04 samba配置
- ubuntu16.04 svn配置
- Ubuntu16.04配置TensorFlow
- ubuntu16.04配置ssh
- Ubuntu16.04声卡配置
- Ubuntu16.04配置
- ubuntu16.04配置虚拟主机
- ubuntu16.04 一些配置
- Ubuntu16.04 配置虚拟主机
- ubuntu16.04配置环境变量
- Kubernetes1.6新特性:POD高级调度-污点和容忍特性/报告节点问题特性
- 股神
- 贪心——洛谷P2255 [USACO14JAN]记录奥林比克Recording the M…
- 学习笔记二--Weex语法介绍
- 快速排序partition过程常见的两种写法
- Ubuntu16.04配置使用deepnlp
- CMake入门教程
- 关系运算符及循环,逻辑运算符
- python 常用库
- 557. Reverse Words in a String III
- gis地图图层(前台)
- jQuery 事件方法
- win7系统安装网银助手时提示您尚未正确安装错误的两种解决方法图文教程
- TCP-IP 第四版 第一章 引言