Week6-5Language Modelling2

来源:互联网 发布:java 二进制中文乱码 编辑:程序博客网 时间:2024/05/23 19:13

Smoothing

  • If the vocabulary size is V=1M
    • Too many parameters to estimate even a unigram model
    • MLE assigns value of 0 to unseen data, let alone bigram and trigram.
  • Smoothing(regularization)
    • Reassign some probability mass to some unseen data

How to model novel words?
-Distribute some of the probability mass to allow novel events

Add-one(Laplace) smoothing

  • Bigrams: P(wiwi1)=c(wi1,Wi)+1cwi1+V
  • reassign too much probability mass to unseen data
  • possible to add k instead of add 1

Advanced smoothing

  • Good-turing
    • try to predict the probabilities of unseen events based on the probabilities of seen events
    • Kneser-Ney
    • Class-based n-grams

Good-turing

这里写图片描述

  • Actual count c,
  • Nc: total number of n-grams that occur exactly c times in the corpus
  • N0: total number of n-grams in the corpus
  • Revised count c=(c+1)Nc+1Nc

这里写图片描述

How do we deal with the sparse data?

Backoff

  • Going back to the lower-order n-gram model if the higher-order model is sparse.
  • Learning the parameter

Interpolation

  • If P(wiwi1,wi2) is sparse:
    • Use λ1P(wiwi1,wi2)+λ2P(wiwi1)+λ3P(wi)
  • See [Chen and Goodman, 1998]
0 0