朴素贝叶斯分类Naive Bayesian
来源:互联网 发布:js奇偶数判断的代码 编辑:程序博客网 时间:2024/05/22 00:45
本算法依据《数据挖掘概念与技术》第三版(韩家炜)中的朴素贝叶斯算法描述来实现的,其分类过程可分为四步(这里只给了简略的步骤描述,详细的公式需看原书):
(1)建立训练元组与类标号矩阵,并相互对应
(2)计算每个类的最大后验概率
(3)计算属性在不同类别下的概率
(4)预测类标号
朴素贝叶斯分类算法主程序:
clc;clear;%%%% first step----to construct the probability tree, the training tuple data%%%% should be load first in ConstructDecisionTree.m, you must notice%%%% that if you want make the other decision different from this case,%%%% you should train the tree first.% The 'result' contain a decision tree and attribute listresult=ConstructProbability();PT=result{1,1};attributeList=result{1,2};classAttr=result{1,3};%%%%second step : load the tuple data% read tuple filefileID = fopen('D:\matlabFile\NaiveBayesian\NaiveBaysian.txt');% read as stringD=textscan(fileID,'%s %s %s %s');fclose(fileID); %%%%% third step,make the decision,conclusion=cell(1,1);% get attributes from D for i=1:size(attributeList,1) conclusion{1,i}=attributeList{i,1};endif size(D{1,1},1)>2 for i=2:size(D{1,1},1) tuple=conclusion(1,:); for j=1:size(D,2) tuple{2,j}=D{1,j}{i,1}; end decision=ErgodicPT(PT,attributeList,tuple); tuple{2,j+1}=decision; conclusion(size(conclusion,1)+1,:)=tuple(2,:); endendFID=fopen('conclusion.txt','wt');for i=1:size(conclusion,1) for j=1:size(conclusion,2) fprintf(FID, '%s ', conclusion{i,j}); end fprintf(FID,'\n'); endfclose(FID);
ConstructProbability函数实现代码:
%construct the probabilityfunction result=ConstructProbability() % read training tuple file fileID = fopen('D:\matlabFile\NaiveBayesian\TrainingSet.txt'); % read as string Dataset=textscan(fileID,'%s %s %s %s %s'); fclose(fileID); %appoint the attribute class classA='buys-computer'; attrs={0,0}; % remeber the class attribute id id=0; % get attribute list from D for i=1:size(Dataset,2) % find the class attribute if strcmp(classA,Dataset{1,i}{1,1})==1 id=i; end attrs{i,1}=Dataset{1,i}{1,1}; % initialize the attr class attr=cell(1,1); for j=2:size(Dataset{1,i},1) % judge the attr class is exist or not flag_attr=0; for k=1:size(attr,1) if strcmp(attr{k,1},Dataset{1,i}{j,1}) Dataset{1,i}{j,1}=k-1; flag_attr=1; break; end end % if attr class does not exist,add new attr if flag_attr==0 attr{k+1,1}=Dataset{1,i}{j,1}; Dataset{1,i}{j,1}=k; end end attr(1,:)=[]; % add attr class to attrs attrs{i,2}=attr; Dataset{1,i}(1,:)=[]; end % create new metrix DS=zeros(size(Dataset{1,1},1),1); % convert cell to metrix for i=1:size(Dataset,2) DataTemp=cell2mat(Dataset{1,i}); DS=cat(2,DS,DataTemp); end DS(:,1)=[]; % move the columns, to make sure that the last column is class attribute DS=circshift(DS,[0,size(DS,2)-id]); % adjust the attribute list, to make sure the class attribute at the last % position p_temp=attrs(id,:); attrs(id,:)=[]; attrs(size(attrs,1)+1,:)=p_temp; % computer the probabilities of all attributes with condition of class rows=unique(DS(:,size(DS,2)),'rows'); % sort the value so as to mapping the attribute list order rows=sortrows(rows); ProbabilityTree=cell(1,2); for i=1:size(rows,1) D=DS; r=find(DS(:,size(DS,2))~=rows(i,1)); D(r,:)=[]; ProbabilityTree{i,1}=size(D,1)/size(DS,1); % add node to Probability tree node=cell(1,1); % compute probability about every value of attribute for j=1:size(D,2)-1 rows=unique(D(:,j),'rows'); subNode=cell(1,2); % sort the rows rows=sortrows(rows); for k=1:size(rows,1) subD=D; subNode{k,1}=rows(k,1); r=find(D(:,j)~=rows(k,1)); subD(r,:)=[]; subNode{k,2}=size(subD,1)/size(D,1); end node{j,1}=subNode; end ProbabilityTree{i,2}=node; end result={ProbabilityTree,attrs,classA};end
ErgodicPT函数实现过程如下:
function result=ErgodicPT(PT,attributeList,tuple)
% translate tuple attribute value into integer t=zeros(1,1); for i=1:size(tuple,2) for j=1:size(attributeList{i,2},1) if strcmp(attributeList{i,2}{j,1},tuple{2,i}) t(1,i)=j; break; end end end % computer the probabilityr=zeros(1,2);for i=1:size(PT,1) r(i,1)=i; R=1; for j=1:size(t,2) flag=0; for k=1:size(PT{i,2}{j,1},1) if PT{i,2}{j,1}{k,1}==t(1,j) R=R*PT{i,2}{j,1}{k,2}; flag=1; break; end end if flag==0 R=0; end end R=R*PT{i,1}; r(i,2)=R;endr=sortrows(r,-2);result=attributeList{size(attributeList,1),2}{r(1,1),1};end
TrainingSet.txt训练数据格式,请复制后保存为txt格式
age income student creditrating buys-computeryouth high no fair noyouth high no excellent nomiddleaged high no fair yessenior medium no fair yessenior low yes fair yessenior low yes excellent nomiddleaged low yes excellent yesyouth medium no fair noyouth low yes fair yessenior medium yes fair yesyouth medium yes excellent yesmiddleaged medium no excellent yesmiddleaged high yes fair yessenior medium no excellent no
NaiveBaysian.txt需要分类的数据,请复制后保存为txt格式:
age income student creditratingyouth high no fairyouth high no excellentmiddleaged high no fairsenior medium no fairsenior low yes fairsenior low yes excellentmiddleaged low yes excellentyouth medium no fairyouth low yes fairsenior medium yes fairyouth medium yes excellentmiddleaged medium no excellentmiddleaged high yes fairsenior medium no excellent
分类结果数据,请参照
age income student creditrating buys-computer youth high no fair no youth high no excellent no middleaged high no fair yes senior medium no fair yes senior low yes fair yes senior low yes excellent yes middleaged low yes excellent yes youth medium no fair no youth low yes fair yes senior medium yes fair yes youth medium yes excellent yes middleaged medium no excellent yes middleaged high yes fair yes senior medium no excellent no
阅读全文
0 0
- 朴素贝叶斯分类Naive Bayesian
- 朴素贝叶斯分类算法(Naive Bayesian classification)
- 朴素贝叶斯分类(Naive Bayesian classification)
- 朴素贝叶斯分类(Naive Bayesian classification)
- 朴素贝叶斯分类(Naive Bayesian classification)
- 朴素贝叶斯分类算法(Naive Bayesian classification)
- 朴素贝叶斯分类算法(Naive Bayesian classification)
- Bayesian Classifier (Naive Bayesian Classifier - 朴素贝叶斯分类)
- naive-bayesian-朴素贝叶斯
- Naive Bayesian(朴素贝叶斯)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 分类算法之朴素贝叶斯分类(Naive Bayesian classification)
- 京东京麦开放平台的高可用架构之路
- EasyDemo*常用API体系结构图(download pic Thx)
- codeforces-894C Marco and GCD Sequence
- 【Scikit-Learn 中文文档】优化估计器的超参数
- Tensorflow CIFAR-10训练例子报错解决
- 朴素贝叶斯分类Naive Bayesian
- Single Number III
- 引入Fresco 网络加载图片的学习
- 【Scikit-Learn 中文文档】模型评估: 量化预测的质量
- 如何快速且准确地运用搜索引擎查找资料
- http请求头详解
- LeetCode191 Number of 1 Bits
- Win10 Tensorflow(gpu) 安装详解
- 全球100款大数据工具汇总,入行必备