Detecting Part of Speech--POS

来源:互联网 发布:卓有成效的管理者知乎 编辑:程序博客网 时间:2024/05/29 18:27

  • POS
  • The tagging process
  • Importance Of POS

POS

The context of the word is an important aspect of determining what type of word it is.

The tagging process

Tagging is the process of assigning a description to a token or a portion of text. This description is called a tag. POS tagging is the process of assigning a POS tag to a token. These tags are normally tags such as noun, verb, and adjective.

Process

  • Tokenizing the text
  • Determining/Identifying possible tags
  • resolving ambiguous tags

Methods


  • Rule-based: Rule-based taggers uses a set of rules and a dictionary of words and possible tags. The rules are used when a word has multiple tags. Rules often use the previous and/or following words to select a tag.
  • Stochastic: Stochastic taggers use is either based on the Markov model or are cue-based, which uses either decision trees or maximum entropy. Markov models are finite state machines where each state has two probability distributions. Its objective is to find the optimal sequence of tags for a sentence. Hidden Markov Models (HMM) are also used. In these models, the state transitions are not visible.

Importance Of POS

Proper tagging of a sentence can enhance the quality of downstream processing tasks.

Determining the POS, phrases, clauses, and any relationship between them is called parsing

POS tagging is used for many downstream processes such as question analysis and analyzing the sentiment of text.

Text indexing will frequently use POS data.

Speech processing can use tags to help decide how to pronounce words.

//Opennlptry(Inputstream modelIn = new FileInputStream(new File(getModelDir(),"en-pos-maxent.bin"));){    POSModel model = new POSModel(modelIn);    POSTaggerME tagger = new POSTaggerME(model);    String tags[] = tagger.tag(sentence);    for(int i = 0; i < sentence.length; i++ )    {        System.out.print(sentence[i] + "/" + tags[i] + " ");    }    Sequence topSequence[] = tagger.topKSequence(sentence);    for(int i = 0; i < topSequence.length; i ++)    {        System.out.println(topSequence[i]);        double probabilities[] = topSequence[i].getProbs();    }}catch(IOException){}
//StanfordnlpMaxentTagger tagger = new MaxentTagger(getModelDir() + "//wsj-0-18-bidirectional-distsim.tagger");List<List<HasWord>> sentences = MaxentTagger.tokenizeText(new BufferedReader(new FileReader("sentences.txt")));List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);for (List<HasWord> sentence : sentences) {    List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);    System.out.println(taggedSentence);}List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);for (List<HasWord> sentence : sentences) {    List<TaggedWord> taggedSentence = tagger.tagSentence(sentence);    System.out.println(Sentence.listToString(taggedSentence, false));}
0 0
原创粉丝点击