斯坦福stanford coreNLP 宾州树库汉语短语类别表23个
来源:互联网 发布:沪江网络课程 编辑:程序博客网 时间:2024/05/22 05:33
短语标记17个
标注
英文说明
中文说明
ADJP
Adjective phrase
形容词短语,由JJ投射
ADVP
Adverbial phrase headed by AD
由副词开头的副词短语、状语
CLP
Classifier phrase
量词短语
CP
Clause headed by C(complementizer)
由补语引导的补语从句,关系从句
DNP
Phrase formed by “XP+DEG”
XP+DEG结构构成的短语
DP
Determiner phrease
限定词短语
DVP
Phrase formed BY ‘’XP+DEB“
XP+DEV结构构成的短语
FRAG
fragment
片段
IP
InflectionPhrase
Simple clause headed by I(INFL或其他曲折成份)
LCP
Phrase formed by ”XP+LC“
处所词为中心语的短语
LST
List marker
用于解释说明性的列表标记短语
NP
Noun phrase
名词短语
PP
Preposition phrase
介词短语
PRN
Parenthetical
插入语
QP
Quantifier phrase
数词短语,由数量词构成的短语结构
UCP
Unidentical coordination phrase
非一致性并列短语
VP
Verb phrase
动词短语
动词复合6个标记
VCD 并列动词复合(VCD (VV投资) (VV办厂))VCP VV+VC 动词+是
VNV A不A,A一A,(VNV(VV 能) (AD 不) (VV 能))
VPT V的R,或V不R (VPT (VV 得) (AD 不) (VV 到))
VRD 动词结果复合,第二个成份是第一个成份的结果(VRD (VV 呈现) (VV 出));(VP(VRD(VV 联合) (VV 起来)))
VSB 定语+核心复合,第一个成份为不及物动词,两个成份之间没有附加语或者体标记,VSB (VV 加速) (VV 建设)) (VP(VSB(VV 仰头)(VV 望去)))
NP
中心词为名词构成的短语。从语法角度看,有两种含义:(1)按句法成份构成的短语,如组块在句子中充当主语、宾语等,可以增加辅助标签,NP-Sbg,NP-Obj;(2)知识库中的实体和属性,这种组块称为baseNP。
VP
以动词为中心,与其修饰、限定、并列成份共同构成的一种语义组块。
CoreNLP中源码
nonTerminalInfo.put("ROOT",new String[][]{{left, "IP"}});nonTerminalInfo.put("PAIR",new String[][]{{left, "IP"}});// Major syntactic categoriesnonTerminalInfo.put("ADJP",new String[][]{{left, "JJ","ADJP"}}); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJPnonTerminalInfo.put("ADVP",new String[][]{{left, "AD","CS", "ADVP","JJ"}}); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewritesnonTerminalInfo.put("CLP",new String[][]{{right, "M","CLP"}});//nonTerminalInfo.put("CP", newString[][] {{left,"WHNP","IP","CP","VP"}}); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.nonTerminalInfo.put("CP",new String[][]{{right, "DEC","WHNP", "WHPP"},rightExceptPunct}); // the (syntax-oriented) right-first head rule// nonTerminalInfo.put("CP", new String[][]{{right, "DEC","ADVP", "CP", "IP", "VP","M"}}); // the (syntax-oriented) right-first head rulenonTerminalInfo.put("DNP",new String[][]{{right, "DEG","DEC"}, rightExceptPunct});//according to tgrep2, first preparation, all DNPs have a DEG daughternonTerminalInfo.put("DP",new String[][]{{left, "DT","DP"}}); // there's one instance of DP adjunctionnonTerminalInfo.put("DVP",new String[][]{{right, "DEV","DEC"}}); // DVP always has DEV under itnonTerminalInfo.put("FRAG",new String[][]{{right, "VV","NN"}, rightExceptPunct});//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"nonTerminalInfo.put("INTJ",new String[][]{{right, "INTJ","IJ", "SP"}});nonTerminalInfo.put("IP",new String[][]{{left, "VP","IP"}, rightExceptPunct}); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe headnonTerminalInfo.put("LCP",new String[][]{{right, "LC","LCP"}}); // there's a bit of LCP adjunctionnonTerminalInfo.put("LST",new String[][]{{right, "CD","PU"}}); // covers all examplesnonTerminalInfo.put("NP",new String[][]{{right, "NN","NR", "NT","NP", "PN","CP"}}); // Basic heads are NN/NR/NT/NP; PN is pronoun. Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites. Finally, note that this doesn't give any specialtreatment of coordination.nonTerminalInfo.put("PP",new String[][]{{left, "P","PP"}}); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2// cdm 2006: PRN changed to not choose punctuation. Helped parsing (if not significantly)// nonTerminalInfo.put("PRN", new String[][]{{left,"PU"}}); //presumably left/right doesn't matternonTerminalInfo.put("PRN",new String[][]{{left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP"}, {rightdis, "NN","NR", "NT","FW"}});// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-opnonTerminalInfo.put("QP",new String[][]{{right, "QP","CLP", "CD","OD", "NP","NT", "M"}});//there's some QP adjunction// add OD?nonTerminalInfo.put("UCP",new String[][]{{left, }}); //an alternative would be"PU","CC"nonTerminalInfo.put("VP",new String[][]{{left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV"},leftExceptPunct}); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP// add BA, LB, as needed// verb compoundsnonTerminalInfo.put("VCD",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});//could easily be right insteadnonTerminalInfo.put("VCP",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});// notmuch info from documentationnonTerminalInfo.put("VRD",new String[][]{{left, "VCD","VRD", "VV","VA", "VC","VE"}}); // definitely leftnonTerminalInfo.put("VSB",new String[][]{{right, "VCD","VSB", "VV","VA", "VC","VE"}}); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)nonTerminalInfo.put("VNV",new String[][]{{left, "VV","VA", "VC","VE"}}); // left/right doesn't matternonTerminalInfo.put("VPT",new String[][]{{left, "VV","VA", "VC","VE"}}); // activity verb is to the left// some POS tags apparently sit where phrases are supposed to benonTerminalInfo.put("CD",new String[][]{{right, "CD"}});nonTerminalInfo.put("NN",new String[][]{{right, "NN"}});nonTerminalInfo.put("NR",new String[][]{{right, "NR"}});// I'm adding these POS tags to doprimitive morphology for character-level// parsing. It shouldn't affect anythingelse because heads of preterminals are not// generally queried - GMAnonTerminalInfo.put("VV",new String[][]{{left}});nonTerminalInfo.put("VA",new String[][]{{left}});nonTerminalInfo.put("VC",new String[][]{{left}});nonTerminalInfo.put("VE",new String[][]{{left}});// new for ctb6.nonTerminalInfo.put("FLR",new String[][]{rightExceptPunct});// new for CTB9nonTerminalInfo.put("DFL",new String[][]{rightExceptPunct});nonTerminalInfo.put("EMO",new String[][]{leftExceptPunct});//left/right doesn't matternonTerminalInfo.put("INC",new String[][]{leftExceptPunct});nonTerminalInfo.put("INTJ",new String[][]{leftExceptPunct});nonTerminalInfo.put("OTH",new String[][]{leftExceptPunct});nonTerminalInfo.put("SKIP",new String[][]{leftExceptPunct});
- 斯坦福stanford coreNLP 宾州树库汉语短语类别表23个
- 斯坦福Stanford coreNLP宾州树库的词性标注规范
- 斯坦福 stanford coreNLP 中的PCFG parser-lexparser
- Stanford CoreNLP使用
- Stanford CoreNLP 介绍
- Stanford CoreNLP API
- 斯坦福大学Stanford coreNLP 宾州树库依存句法标注体系
- stanford corenlp自定义切词类
- Stanford coreNLP源码学习(1)
- Stanford CoreNLP遇到的问题
- 1.getting started Stanford CoreNLP
- Stanford CoreNLP 进行中文分词
- Stanford CoreNLP学习日记1
- Stanford CoreNLP学习日记2
- Stanford CoreNLP学习日记3
- Stanford CoreNLP学习日记4
- Stanford CoreNLP学习日记5
- Stanford CoreNLP生成CoNLL数据格式
- Notepad++ FTP使用方法
- 归一化
- StringBuffer的用法
- 1007: [HNOI2008]水平可见直线
- OC语言学习05-数组的操作
- 斯坦福stanford coreNLP 宾州树库汉语短语类别表23个
- 时间类的静态成员计数
- Selenium webdriver Java ——IE浏览器启动
- editText属性
- 简单的自定义实现Stack模板(顺序栈以及链式栈没有迭代器和销毁)
- Echarts的研究(一)
- MIT6.828 LAB6: Network Driver
- 一张图理解Android事件传递机制
- Java实训第3天(Java语言基础)