第五章 分类和词性标注

来源:互联网 发布:涌金水利软件 编辑:程序博客网 时间:2024/04/23 22:16

import nltk

text = nltk.word_tokenize("And now for something completely different")

a=nltk.pos_tag(text)

分词后进行词性标注,CC是连词,RB是副词,IN是介词,NN是名词,JJ是形容词,


可以使用

nltk.help.upenn_tagset('RB')

查询缩写字母代表的含义


标记含义例子
ADJ 形容词new, good, high, special, big, local
ADV 动词really, already, still, early, now
CNJ 连词and, or, but, if, while, although
DET 限定词the, a, some, most, every, no
EX 存在量词there, there's
FW 外来词dolce, ersatz, esprit, quo, maitre
MOD 情态动词will, can, would, may, must, should
N 名词year, home, costs, time, education
NP 专有名词Alison, Africa, April, Washington
NUM 数词twenty-four, fourth, 1991, 14:24
PRO 代词he, their, her, its, my, I, us
P 介词on, of, at, with, by, into, under
TO 词to to
UH 感叹词ah, bang, ha, whee, hmpf, oops
V 动词is, has, get, do, make, see, run
VD 过去式said, took, told, made, asked
VG 现在分词making, going, playing, working
VN 过去分词given, taken, begun, sung
WH Wh 限定词who, which, when, what, where, how


将词性按频率排列

tag_fd = nltk.FreqDist(tag for (word, tag) in a)

将tag_fd转化为链表

tag_fd.keys()

将频率分布转化为累计分布图

tag_fd.plot(cumulative=True)


0 0
原创粉丝点击