模式识别(七):MATLAB 实现朴素贝叶斯分类器
来源:互联网 发布:三维试衣软件哪个好 编辑:程序博客网 时间:2024/06/06 01:54
本系列文章由云端暮雪编辑,转载请注明出处
http://blog.csdn.net/lyunduanmuxue/article/details/20068781
多谢合作!
基础介绍
今天介绍一种简单高效的分类器——朴素贝叶斯分类器(Naive Bayes Classifier)。
相信学过概率论的同学对贝叶斯这个名字应该不会感到陌生,因为在概率论中有一条重要的公式,就是以贝叶斯命名的,这就是“贝叶斯公式”:
贝叶斯分类器就是基于这条公式发展起来的,之所以这里还加上了朴素二字,是因为该分类器对各类的分布做了一个假设,即不同类的数据样本之间是相互独立的。这样的假设是非常强的,但并不影响朴素贝叶斯分类器的适用性。1997年,微软研究院的 Domingos 和 Pazzani 通过实验证明,即使在其前提假设不成立的情况下,该分类器依然表现出良好的性能。对这一现象的一个解释是,该分类器需要训练的参数比较少,所以能够很好的避免发生过拟合(overfitting)。
实现说明
下面我们一步步来实现贝叶斯分类器。
分类器的训练分两步:
- 计算先验概率;
- 计算似然函数;
对朴素贝叶斯分类器有了最基本的认识之后,下面我们开始尝试用 MATLAB 设计一个出来。
首先计算先验概率:
- function priors = nbc_Priors(training)
- %NBC_PRIORS calculates the priors for each class by using the training data
- %set.
- %% priors = nbc_Priors(training)
- %% Input:
- % training - a struct representing the training data set
- % training.class - the class of each data
- % training.features - the feature of each data
- %% Output:
- % priors - a struct representing the priors of each class
- % priors.class - the class labels
- % priors.value - the priors of its corresponding classes
- %% Running these code to get some examples:
- %nbc_mushroom
- %% Edited by X. Sun
- % My homepage: http://pamixsun.github.io/
- %%
- % Check the input arguments
- if nargin < 1
- error(message('MATLAB:UNIQUE:NotEnoughInputs'));
- end
- % Extract the class labels
- priors.class = unique(training.class);
- % Initialize the priors.value
- priors.value = zeros(1, length(priors.class));
- % Calculate the priors
- for i = 1 : length(priors.class)
- priors.value(i) = (sum(training.class == class(i))) / (length(training.class));
- end
- % Check the results
- if sum(priors.value) ~= 1
- error('Prior error');
- end
- end
紧接着,是训练完整的朴素贝叶斯分类器:
function [likelihood, priors] = train_nbc(training, featureValues, addOne)%TRAIN_NBC trains a naive bayes classifier using the training data set.%% [likelihood, priors] = train_nbc(training, featureNames, addOne)%% Input:% training - a struct representing the training data set% training.class - the class of each data% training.features - the feature of each data% featureValues - a cell that contains the values of each feature% addOne - to chose whether use add one smoothing or not,% 1 indicates yes, 0 otherwise.%% Output:% likelihood - a struct representing the likelihood% likelihood.matrixColnames - the feature values% likelihood.matrixRownames - the class labels% likelihood.matrix - the likelihood values% priors - a struct representing the priors of each class% priors.class - the class labels% priors.value - the priors of its corresponding classes%% Running these code to get some examples:%nbc_mushroom%% Edited by X. Sun% My homepage: http://pamixsun.github.io/%%% Check the input argumentsif nargin < 2 error(message('MATLAB:UNIQUE:NotEnoughInputs'));end% Set the default value for addOne if it is not givenif nargin == 2 addOne = 0;end% Calculate the priorspriors = nbc_Priors(training);% Learn the features by calculating likelihoodfor i = 1 : size(training.features, 2) uniqueFeatureValues = featureValues{i}; trainingFeatureValues = training.features(:, i); likelihood.matrixColnames{i} = uniqueFeatureValues; likelihood.matrixRownames{i} = priors.class; likelihood.matrix{i} = zeros(length(priors.class), length(uniqueFeatureValues)); for j = 1 : length(uniqueFeatureValues) item = uniqueFeatureValues(j); for k = 1 : length(priors.class) class = priors.class(k); featureValuesInclass = trainingFeatureValues(training.class == class); likelihood.matrix{i}(k, j) = ... (length(featureValuesInclass(featureValuesInclass == item)) + 1 * addOne)... / (length(featureValuesInclass) + addOne * length(uniqueFeatureValues)); end endendend
最后,使用我们训练得到的分类器。
function [predictive, posterior] = predict_nbc(test, priors, likelihood)%PREDICT_NBC uses a naive bayes classifier to predict the class labels of %the test data set.%% [predictive, posterior] = predict_nbc(test, priors, likelihood)%% Input:% test - a struct representing the test data set% test.class - the class of each data% test.features - the feature of each data% priors - a struct representing the priors of each class% priors.class - the class labels% priors.value - the priors of its corresponding classes% likelihood - a struct representing the likelihood% likelihood.matrixColnames - the feature values% likelihood.matrixRownames - the class labels% likelihood.matrix - the likelihood values%% Output:% predictive - the predictive results of the test data set% predictive.class - the predictive class for each data % posterior - a struct representing the posteriors of each class % posterior.class - the class labels % posterior.value - the posteriors of the corresponding classes %% Running these code to get some examples:%nbc_mushroom%% Edited by X. Sun% My homepage: http://pamixsun.github.io/%%% Check the input argumentsif nargin < 3 error(message('MATLAB:UNIQUE:NotEnoughInputs'));endposterior.class = priors.class;% Calculate posteriors for each test data recordpredictive.class = zeros(length(size(test.features, 1)), 1);posterior.value = zeros(size(test.features, 1), length(priors.class));for i = 1 : size(test.features, 1) record = test.features(i, :); % Calculate posteriors for each possible class of that record for j = 1 : length(priors.class) class = priors.class(j); % Initialize posterior as the prior value of that class posteriorValue = priors.value(priors.class == class); for k = 1 : length(record) item = record(k); likelihoodValue = ... likelihood.matrix{k}(j, likelihood.matrixColnames{k}(:) == item); posteriorValue = posteriorValue * likelihoodValue; end % Calculate the posteriors posterior.value(i, j) = posteriorValue; end % Get the predictive class predictive.class(i) = ... posterior.class(posterior.value(i, :) == max(posterior.value(i, :)));endpredictive.class = char(predictive.class);predictive.class = predictive.class(:);end
为了验证我们的分类器能否正常工作,我们使用 UCI 上的 mushroom 数据集来做测试。
测试代码如下(保存为 nbc_mushroom.m):
%% Initialize the enviromentclose all;clear all;clc;%% Import data from fileoriginalData = importdata('agaricus-lepiota.data');featureValues = importdata('featureValues');%% Retrieve class and featureN = length(originalData);predata = zeros(N, 23);for i = 1 : N originalData{i} = strrep(originalData{i}, ',', ''); predata(i, :) = originalData{i}(:)';endfor i = 1 : length(featureValues) featureValues{i} = strrep(featureValues{i}, ',', '');endpredata = char(predata);data.class = predata(:, 1);data.features = predata(:, 2:end);clear originalData;clear predata;%% Visualize data to gain a intuitive understandingfigure('color', 'white');visualData_mushroom(data);%% Train and test Naive Bayes% Set seed to make the results reproduceableseed = 1;rng(seed);% Randomly permutationdataSize = length(data.class);permIndex = randperm(dataSize);% Construct the training data settraining.class = data.class(permIndex(5001 : end));training.features = data.features(permIndex(5001 : end), :);% Cpmstruct the testing data settest.class = data.class(permIndex(1 : 5000));test.features = data.features(permIndex(1 : 5000), :);% Train a NBC[likelihood, priors] = train_nbc(training, featureValues);% Apply a NBC[predictive, posterior] = predict_nbc(test, priors, likelihood);% Calculate the accuracyaccuracy = sum(predictive.class == test.class) / length(test.class)
可视化数据得到结果如下所示,准确率是 99.94%。
写在最后
所有源代码和数据集可以在我的 下载页 上下载到:
http://download.csdn.net/detail/longyindiyi/7994137
当然,上面的代码也没有做到尽善尽美,还是会存在一些缺陷和不足,请读者自己找出。细心的读者可能还会发现,上面的代码只适用于特征值是离散值的情况,那么,对于特征值是连续值的情况应该作何处理呢?欢迎大家在评论中加以讨论。
如若有其他问题,请在回复中给予说明。
- 模式识别(七):MATLAB 实现朴素贝叶斯分类器
- 模式识别三--MATLAB实现贝叶斯分类器
- 朴素贝叶斯分类器:MATLAB工具箱实现
- 朴素贝叶斯实现垃圾邮件分类------matlab实现
- 朴素贝叶斯分类器(Python实现)
- 模式识别中Fisher分类器的Matlab实现及测试
- 朴素贝叶斯算法实现分类以及Matlab实现
- 【模式识别】贝叶斯分类器的C++实现
- 朴素贝叶斯算法实现分类问题(三类)matlab代码
- 逻辑回归和朴素贝叶斯算法实现二值分类(matlab代码)
- 分类器(模式识别)
- 【机器学习-西瓜书】七、朴素贝叶斯分类器
- 朴素贝叶斯分类器 C++ STL 实现
- 朴素贝叶斯分类器:R语言实现
- Python实现朴素贝叶斯分类器
- python实现一个朴素贝叶斯分类器
- 朴素贝叶斯分类器的python实现
- Java实现朴素贝叶斯分类器
- 网站关键词该如何布局
- Ajax与JSON的一些总结
- 【记录吧】2014.9.30
- Android打勾显示输入的密码 - EditText与setTransformationMethod
- cgi相关
- 模式识别(七):MATLAB 实现朴素贝叶斯分类器
- 【内核研究】理解Manager
- CString类常用方法---MakeUpper(),MakeLower(),MakeReverse()
- ListView优化新玩法,打造易维护,高性能,快速开发的ListView
- NLP之路-查看获取文本语料库
- AfxBeginThread、CreateThread与_BegingThread的区别
- ”CreateThread()之后又马上CloseHandle()的问题“ 及 一些注意点
- 博客第一天
- 图论题目总结(二)(提高版,转载)