斯坦福stanford coreNLP 宾州树库汉语短语类别表23个

来源：互联网发布：沪江网络课程编辑：程序博客网时间：2024/05/22 05:33

短语标记17个

标注

英文说明

中文说明

ADJP

Adjective phrase

形容词短语，由JJ投射

ADVP

Adverbial phrase headed by AD

由副词开头的副词短语、状语

CLP

Classifier phrase

量词短语

Clause headed by C（complementizer）

由补语引导的补语从句，关系从句

DNP

Phrase formed by “XP+DEG”

XP+DEG结构构成的短语

Determiner phrease

限定词短语

DVP

Phrase formed BY ‘’XP+DEB“

XP+DEV结构构成的短语

FRAG

fragment

片段

InflectionPhrase

Simple clause headed by I（INFL或其他曲折成份）

LCP

Phrase formed by ”XP+LC“

处所词为中心语的短语

LST

List marker

用于解释说明性的列表标记短语

Noun phrase

名词短语

Preposition phrase

介词短语

PRN

Parenthetical

插入语

Quantifier phrase

数词短语，由数量词构成的短语结构

UCP

Unidentical coordination phrase

非一致性并列短语

Verb phrase

动词短语

动词复合6个标记

VCD 并列动词复合(VCD (VV投资) (VV办厂))
VCP VV+VC 动词+是
VNV A不A，A一A，(VNV(VV 能) (AD 不) (VV 能))
VPT V的R，或V不R (VPT (VV 得) (AD 不) (VV 到))
VRD 动词结果复合，第二个成份是第一个成份的结果(VRD (VV 呈现) (VV 出))；(VP(VRD(VV 联合) (VV 起来)))
VSB 定语+核心复合，第一个成份为不及物动词，两个成份之间没有附加语或者体标记，VSB (VV 加速) (VV 建设)) (VP(VSB(VV 仰头)(VV 望去)))

NP

中心词为名词构成的短语。从语法角度看，有两种含义：（1）按句法成份构成的短语，如组块在句子中充当主语、宾语等，可以增加辅助标签，NP-Sbg，NP-Obj；（2）知识库中的实体和属性，这种组块称为baseNP。

VP

以动词为中心，与其修饰、限定、并列成份共同构成的一种语义组块。

CoreNLP中源码

nonTerminalInfo.put("ROOT",new String[][]{{left, "IP"}});nonTerminalInfo.put("PAIR",new String[][]{{left, "IP"}});// Major syntactic categoriesnonTerminalInfo.put("ADJP",new String[][]{{left, "JJ","ADJP"}}); // there is one ADJP unary rewrite to AD but otherwiseall have JJ or ADJPnonTerminalInfo.put("ADVP",new String[][]{{left, "AD","CS", "ADVP","JJ"}}); // CS is a subordinating conjunctor, and there are acouple of ADVP->JJ unary rewritesnonTerminalInfo.put("CLP",new String[][]{{right, "M","CLP"}});//nonTerminalInfo.put("CP", newString[][] {{left,"WHNP","IP","CP","VP"}}); // this iscomplicated; see bracketing guide p. 34. Actually, all WHNP are empty. IP/CP seems to be the best semantic head; syntax would dictate DEC/ADVP.Using IP/CP/VP/M is INCREDIBLY bad for Dep parser - lose 3% absolute.nonTerminalInfo.put("CP",new String[][]{{right, "DEC","WHNP", "WHPP"},rightExceptPunct}); // the (syntax-oriented) right-first head rule// nonTerminalInfo.put("CP", new String[][]{{right, "DEC","ADVP", "CP", "IP", "VP","M"}}); // the (syntax-oriented) right-first head rulenonTerminalInfo.put("DNP",new String[][]{{right, "DEG","DEC"}, rightExceptPunct});//according to tgrep2, first preparation, all DNPs have a DEG daughternonTerminalInfo.put("DP",new String[][]{{left, "DT","DP"}}); // there's one instance of DP adjunctionnonTerminalInfo.put("DVP",new String[][]{{right, "DEV","DEC"}}); // DVP always has DEV under itnonTerminalInfo.put("FRAG",new String[][]{{right, "VV","NN"}, rightExceptPunct});//FRAGseems only to be used for bits at the beginnings of articles:"Xinwenshe<DATE>" and "(wan)"nonTerminalInfo.put("INTJ",new String[][]{{right, "INTJ","IJ", "SP"}});nonTerminalInfo.put("IP",new String[][]{{left, "VP","IP"}, rightExceptPunct}); // CDM July 2010 following email from Pi-Chuanchanged preference to VP over IP: IP can be -SBJ, -OBJ, or -ADV, and shouldn'tbe headnonTerminalInfo.put("LCP",new String[][]{{right, "LC","LCP"}}); // there's a bit of LCP adjunctionnonTerminalInfo.put("LST",new String[][]{{right, "CD","PU"}}); // covers all examplesnonTerminalInfo.put("NP",new String[][]{{right, "NN","NR", "NT","NP", "PN","CP"}}); // Basic heads are NN/NR/NT/NP; PN is pronoun.  Some NPs are nominalized relative clauseswithout overt nominal material; these are NP->CP unary rewrites.  Finally, note that this doesn't give any specialtreatment of coordination.nonTerminalInfo.put("PP",new String[][]{{left, "P","PP"}}); // in the manual there's an example of VV heading PP butI couldn't find such an example with tgrep2// cdm 2006: PRN changed to not choose punctuation.  Helped parsing (if not significantly)// nonTerminalInfo.put("PRN", new String[][]{{left,"PU"}}); //presumably left/right doesn't matternonTerminalInfo.put("PRN",new String[][]{{left, "NP","VP", "IP","QP", "PP","ADJP", "CLP","LCP"}, {rightdis, "NN","NR", "NT","FW"}});// cdm 2006: QP: add OD -- occurs some;occasionally NP, NT, M; parsing performance no-opnonTerminalInfo.put("QP",new String[][]{{right, "QP","CLP", "CD","OD", "NP","NT", "M"}});//there's some QP adjunction// add OD?nonTerminalInfo.put("UCP",new String[][]{{left, }}); //an alternative would be"PU","CC"nonTerminalInfo.put("VP",new String[][]{{left, "VP","VCD", "VPT","VV", "VCP","VA", "VC","VE", "IP","VSB", "VCP","VRD", "VNV"},leftExceptPunct}); //note that ba and long bei introduce IP-OBJ smallclauses; short bei introduces VP// add BA, LB, as needed// verb compoundsnonTerminalInfo.put("VCD",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});//could easily be right insteadnonTerminalInfo.put("VCP",new String[][]{{left, "VCD","VV", "VA","VC", "VE"}});// notmuch info from documentationnonTerminalInfo.put("VRD",new String[][]{{left, "VCD","VRD", "VV","VA", "VC","VE"}}); // definitely leftnonTerminalInfo.put("VSB",new String[][]{{right, "VCD","VSB", "VV","VA", "VC","VE"}}); // definitely right, though some examples lookquestionably classified (na2lai2 zhi1fu4)nonTerminalInfo.put("VNV",new String[][]{{left, "VV","VA", "VC","VE"}}); // left/right doesn't matternonTerminalInfo.put("VPT",new String[][]{{left, "VV","VA", "VC","VE"}}); // activity verb is to the left// some POS tags apparently sit where phrases are supposed to benonTerminalInfo.put("CD",new String[][]{{right, "CD"}});nonTerminalInfo.put("NN",new String[][]{{right, "NN"}});nonTerminalInfo.put("NR",new String[][]{{right, "NR"}});// I'm adding these POS tags to doprimitive morphology for character-level// parsing.  It shouldn't affect anythingelse because heads of preterminals are not// generally queried - GMAnonTerminalInfo.put("VV",new String[][]{{left}});nonTerminalInfo.put("VA",new String[][]{{left}});nonTerminalInfo.put("VC",new String[][]{{left}});nonTerminalInfo.put("VE",new String[][]{{left}});// new for ctb6.nonTerminalInfo.put("FLR",new String[][]{rightExceptPunct});// new for CTB9nonTerminalInfo.put("DFL",new String[][]{rightExceptPunct});nonTerminalInfo.put("EMO",new String[][]{leftExceptPunct});//left/right doesn't matternonTerminalInfo.put("INC",new String[][]{leftExceptPunct});nonTerminalInfo.put("INTJ",new String[][]{leftExceptPunct});nonTerminalInfo.put("OTH",new String[][]{leftExceptPunct});nonTerminalInfo.put("SKIP",new String[][]{leftExceptPunct});

阅读全文

0 0