Week6-3,4Language Modelling1

来源:互联网 发布:vscode 远程调试 编辑:程序博客网 时间:2024/06/05 08:12

Probabilistic language model

  • Assign a probability to a sentence
    • P(S)=P(w1,w2,...,wn)
  • Different from deterministic methods using CFG
  • The sum of the probabilities of all possible sentences must add up to 1

Predicting the next word

P(wnw1,w2,...,wn1)

Uses of LM

  • Speech recognition
    • P(recognize speech) > P(wreck a nice beach)
  • Text gerenation
    • P(three houses) > P(three house)
  • Spelling correction
    • P(my cat eats fish) > P(my xat eats fish)
  • Machine translation
    • P(the blue house) > P(the house blue)
  • OCR

Probability of a sentence

P(S)=P(w1,w2,...,wn)=P(w1)P(w2w1)...P(wnw1,w2,...,wn1)

N-gram model

  • Markov assumption: only look at limited history
    • Unigram
    • Bigram
    • Trigram
  • It is possible to go to 3, 4, 5 grams

N-grams

  • Shakespeare unigrams
    • 29524 types, approx 900k tokens
  • Bigrams
    • 346097 types
  • Sparse data!!

Estimation

We cannot compute the conditional probability directly due to the data sparseness, so we have to use Markov Assumption.

MLE

Using training data

Unigram Example

  • The word pizza appears 700 times in a corpus of 1×107 words
    PML(pizza)=7001×107=7×105

Bigram Example

  • The word with appears 1000 times in the corpus
  • the phrase with spinach appears 6 times
    PML(spinachwith)=count(with spinach)count(with)=61000=0.006

The estimation is domain-based, and it may be not good for other gerenes

这里写图片描述

N-grams and regular languages

  • N-grams are just one way to represent the weighted regular languages

Generative models

这里写图片描述

Engineering trick

  • The MLE values are often on the order of 106 or less
    • multiplying 20 such values gives a number on the order of 10120
    • this leads to underflow
  • Use (base 10) logarithms instead
    • 106 becomes -6
    • Use sums instead of products
0 0
原创粉丝点击