程序博客网 > vscode 远程调试

Week6-3,4Language Modelling1

来源：互联网发布：vscode 远程调试编辑：程序博客网时间：2024/06/05 08:12

Probabilistic language model

Assign a probability to a sentence
- P(S)=P(w1,w2,...,wn)
Different from deterministic methods using CFG
The sum of the probabilities of all possible sentences must add up to 1

Predicting the next word

P (w n ∣ w 1, w 2, . . ., w n - 1)

Uses of LM

Speech recognition
- P(recognize speech) > P(wreck a nice beach)
Text gerenation
- P(three houses) > P(three house)
Spelling correction
- P(my cat eats fish) > P(my xat eats fish)
Machine translation
- P(the blue house) > P(the house blue)
OCR
…

Probability of a sentence

P (S) = P (w 1, w 2, . . ., w n) = P (w 1) P (w 2 ∣ w 1) . . . P (w n ∣ w 1, w 2, . . ., w n - 1)

N-gram model

Markov assumption: only look at limited history
- Unigram
- Bigram
- Trigram
It is possible to go to 3, 4, 5 grams

N-grams

Shakespeare unigrams
- 29524 types, approx 900k tokens
Bigrams
- 346097 types
Sparse data!!

Estimation

We cannot compute the conditional probability directly due to the data sparseness, so we have to use Markov Assumption.

MLE

Using training data

Unigram Example

The word pizza appears 700 times in a corpus of 1×107 words
$P M L (p i z z a) = 700 1 \times 10 7 = 7 \times 10 - 5$

Bigram Example

The word with appears 1000 times in the corpus
the phrase with spinach appears 6 times
$P M L (s p i n a c h ∣ w i t h) = c o u n t ( with spinach ) c o u n t ( with ) = 6 1000 = 0.006$

The estimation is domain-based, and it may be not good for other gerenes

这里写图片描述

N-grams and regular languages

N-grams are just one way to represent the weighted regular languages

Generative models

这里写图片描述

Engineering trick

The MLE values are often on the order of 10−6 or less
- multiplying 20 such values gives a number on the order of 10−120
- this leads to underflow
Use (base 10) logarithms instead
- 10−6 becomes -6
- Use sums instead of products

0 0

vscode 远程调试

vscode 远程调试

原创粉丝点击

热门问题 老师的惩罚人脸识别我在镇武司摸鱼那些年重生之率土为王我在大康的咸鱼生活盘龙之生命进化天生仙种凡人之先天五行春回大明朝姑娘不必设防，我是瞎子 75平方两室一厅装修效果图 113平方三室两厅装修效果图2015 85平米两室一厅装修效果图四十平米小户型装修效果图三房两厅两卫装修效果图现代简约客厅装修效果图 3平方米卫生间装修效果图主人房卫生间装修效果图 110平米三室两厅装修效果图 30平米单身公寓装修效果图现代简约风格装修效果图 65平米小户型装修效果图大全家装评选装修帮登陆家装帮东易日盛全屋定制家装东易日盛家装怎么样柠檬树家装柠檬树家装公司家装循环水正规做法家装循环水的做法图片家装水电工我要招聘家装水电工2名珠海互联网家装公司家装材料团购网酷乐家装网免费家装设计网互联网家装市场分析济南家装网五十家互联网家装北京家装网德阳家装网家装论坛网武汉家装网怎么样保驾护航家装网美乐乐家装网武汉互联网家装公司乌鲁木齐家装网杭州家装网广州装修网家居家装网