关于学习朴素贝叶斯心得
来源:互联网 发布:json 转数组 编辑:程序博客网 时间:2024/05/01 01:06
朴素贝叶斯
- 假设:基于属性相互独立。
- 原理:基于贝叶斯定理,p(c|x)=p(x|c)*p(c)/p(x),根据贝叶斯定理,后验概率P(Y=c | X=x) = 条件概率P(X=x | Y=c) * 先验概率P(Y = c) / P(X=x),取P(X=x | Y=c) * P(Y = c)最大的分类作为输出。
- c表示类别(c1,c2),x表示属性(x1,x2,x3,....)
- P(A∣B) 表示在确定B的情况下,事件A发生的概率,而在实际情况中,我们或许更关心P(B∣A)但是只能直接获得P(A∣B) ,此时我们需要一个工具可以把P(A∣B) 和P(B∣A)相互转化, 贝叶斯定理就是这样一个公式,下面给出贝叶斯定理:P(B|A)=P(A|B)P(B)/P(A)对苹果分类的问题,有三个特征F = {f1, f2, f3},两种分类C = {c1, c2},根据贝叶斯公式有给定特征条件下,特征为ci的概率P(ci|f1f2f3)=P(f1f2f3|ci)P(ci)/P(f1f2f3)使得上式取得最大值的ci即为分类结果,由于对给定训练集来说,P(f1f2f3)为常数,那么就转为为求
P(f1f2f3∣ci)P(ci)
的最大值。
朴素贝叶斯的假设在这里就体现了,由于特征值相互独立,那么上式可以转化为
P(f1∣ci)P(f2∣ci)P(f3∣ci)P(ci)
整个问题就变为求使得上式取最大值的ci,而上式中的每一项都可以从训练集中得到。
最后讨论下Laplace校准,如果某一个特性值在训练集中出现的次数为0,那么以上我们讨论的公式就没有意义了,以为对所有的类型结果都是0。当然对训练集进行选择可以避免这种情况,但是如果避免不了就需要进行Laplace校准。其实很简单,把所有出现特征出现的次数都加上1,即为Laplace校准。
#构造训练集data <- matrix(c("sunny","hot","high","weak","no", "sunny","hot","high","strong","no", "overcast","hot","high","weak","yes", "rain","mild","high","weak","yes", "rain","cool","normal","weak","yes", "rain","cool","normal","strong","no", "overcast","cool","normal","strong","yes", "sunny","mild","high","weak","no", "sunny","cool","normal","weak","yes", "rain","mild","normal","weak","yes", "sunny","mild","normal","strong","yes", "overcast","mild","high","strong","yes", "overcast","hot","normal","weak","yes", "rain","mild","high","strong","no"), byrow = TRUE, dimnames = list(day = c(),
#byrow项控制排列元素时是否按行进行,dimnames给定行和列的名称. condition = c("outlook","temperature", "humidity","wind","playtennis")), nrow=14, ncol=5);#计算先验概率show(data)prior.yes = sum(data[,5] == "yes") / length(data[,5]);prior.no = sum(data[,5] == "no") / length(data[,5]);#模型naive.bayes.prediction <- function(condition.vec) {# Calculate unnormlized posterior probability for playtennis = yes.playtennis.yes <-sum((data[,1] == condition.vec[1]) & (data[,5] == "yes")) / sum(data[,5] == "yes") * # P(outlook = f_1 | playtennis = yes)sum((data[,2] == condition.vec[2]) & (data[,5] == "yes")) / sum(data[,5] == "yes") * # P(temperature = f_2 | playtennis = yes)sum((data[,3] == condition.vec[3]) & (data[,5] == "yes")) / sum(data[,5] == "yes") * # P(humidity = f_3 | playtennis = yes)sum((data[,4] == condition.vec[4]) & (data[,5] == "yes")) / sum(data[,5] == "yes") * # P(wind = f_4 | playtennis = yes) prior.yes; # P(playtennis = yes)# Calculate unnormlized posterior probability for playtennis = no.playtennis.no <-sum((data[,1] == condition.vec[1]) & (data[,5] == "no")) / sum(data[,5] == "no") * # P(outlook = f_1 | playtennis = no)sum((data[,2] == condition.vec[2]) & (data[,5] == "no")) / sum(data[,5] == "no") * # P(temperature = f_2 | playtennis = no)sum((data[,3] == condition.vec[3]) & (data[,5] == "no")) / sum(data[,5] == "no") * # P(humidity = f_3 | playtennis = no)sum((data[,4] == condition.vec[4]) & (data[,5] == "no")) / sum(data[,5] == "no") * # P(wind = f_4 | playtennis = no)prior.no; # P(playtennis = no)return(list(post.pr.yes = playtennis.yes,post.pr.no = playtennis.no,prediction = ifelse(playtennis.yes >= playtennis.no, "yes", "no")));}#预测naive.bayes.prediction(c("rain", "hot", "high", "strong"));naive.bayes.prediction(c("sunny", "mild", "normal", "weak"));naive.bayes.prediction(c("overcast", "mild", "normal", "weak"));naive.bayes.prediction(c("rain", "hot", "high", "strong"));
## $post.pr.yes
## [1] 0.005291005
##
## $post.pr.no
## [1] 0.02742857
##
## $prediction
## [1] "no"
naive.bayes.prediction(c("sunny", "mild", "normal", "weak"));
## $post.pr.yes
## [1] 0.02821869
##
## $post.pr.no
## [1] 0.006857143
##
## $prediction
## [1] "yes"
naive.bayes.prediction(c("overcast", "mild", "normal", "weak"));
## $post.pr.yes
## [1] 0.05643739
##
## $post.pr.no
## [1] 0
##
## $prediction
## [1] "yes"
0 0
- 关于学习朴素贝叶斯心得
- 关于机器学习中的朴素贝叶斯以及拉普拉斯平滑
- #朴素贝叶斯学习#
- 机器学习--朴素贝叶斯
- 机器学习-朴素贝叶斯
- 统计学习-朴素贝叶斯
- 机器学习---朴素贝叶斯
- 【机器学习】朴素贝叶斯
- 机器学习 朴素贝叶斯
- 朴素贝叶斯学习总结
- 机器学习-朴素贝叶斯
- 【机器学习】朴素贝叶斯
- 朴素贝叶斯学习
- 关于学习vs心得
- 机器学习3朴素贝叶斯
- 机器学习算法-朴素贝叶斯
- 机器学习:朴素贝叶斯算法
- 机器学习:1、朴素贝叶斯
- GCD相关知识
- 活动分区设置盘符
- 利用media creation tool升级win10
- apache与tomcat实现动静分离--uriworkermap.propertie
- 基于TCP的Socket 编程
- 关于学习朴素贝叶斯心得
- IOS开发笔记30-UITableView(1)
- CSS样式的优势
- Android应用: 任务管理类app开发 ---- 项目分解(零)
- maven 依赖
- Spring框架下 get方法传中文 乱码
- CSS代码语法
- C# 适合vs 2008和vs 2010的变量高亮highlight工具
- Linux useradd命令执行,出现 bash:useradd:command not found