Week2-4Morphological similarity:stemming

来源:互联网 发布:ios图片模糊软件 编辑:程序博客网 时间:2024/05/30 23:20

Whether 2 words are morphologically related.

这里写图片描述

Stemming

to reduce the word to its basic form, which is called the stem, after removing various suffixes and endings, and sometimes performing additional transformations.

Remark: In practice, prefixes are sometimes preserved(rescan).

Porter’s methods for stemming

  • rule-based methods
  • paper An algorithm for stripping
  • the method is not always accurate

Measure

The measure of the word is an indication of the number of syllables in it

  • Each sequence of consonants is denoted by C
  • Each sequence of vowels is denoted by V
  • The initial C and the ending V can be optional

The measure is the repeated [VC] times

Porter’s algorithm

The initial word is checked against a sequence of transformation patterns, in order.

one of the patterns: if the word ends with ation, then the pattern of the word is the part without ation(meditation - medit)

  • whenever the pattern matches, the word is transformed and the algorithm restarts from the beginning of the list of patterns with the transformed word
  • if no pattern matches, the algorithm stops and outputs the most recently transformed version of the word

这里写图片描述

0 0
原创粉丝点击