R--SVM支持向量机

来源:互联网 发布:wpsexcel数据分析工具 编辑:程序博客网 时间:2024/05/21 11:06


数据来源:以扫描字符的识别为例,[Letter Recognition Dataset](https://archive.ics.uci.edu/ml/datasets/Letter+Recognition)


install.packages("knitr")
install.packages("kernlab")
library(kernlab)
library(knit)


### 数据读取并处理
letters<-read.table("letter.txt",sep=",")
str(letters)
dim(letters)
colnames(letters)=c("letter","xbox","ybox","width","height ","onpix","xbar","ybar","x2bar","y2bar","xybar","x2ybar","xy2bar","xege","xedgey","yege","yedgex")


### 构建测试集和训练集
ind<-sample(2,nrow(letters),replace=TRUE,prob=c(0.8,0.2))
traindata<-letters[ind==1,]
testdata<-letters[ind==2,]
dim(testdata)

### 构建分类器
classifier = ksvm(letter~.,data=traindata,kernel="vanilladot")         #kernal 选择需要尝试

## 模型评估
prediction=predict(classifier,newdata=testdata)
table(prediction,testdata$letter)
agreement = prediction == testdata$letter

> table(agreement)
agreement
FALSE  TRUE 
  584  3370 
> prop.table(table(agreement))
agreement
    FALSE      TRUE 
0.1476985 0.8523015 
模型有14.7%的错误率


##模型优化
classifier_rbf = ksvm(letter ~ ., data = traindata,kernel = "rbfdot")
prediction=predict(classifier_rbf,newdata=testdata)
table(prediction,testdata$letter)
agreement = prediction == testdata$letter


> table(agreement)
agreement
FALSE  TRUE 
  273  3681 
> prop.table(table(agreement))
agreement
     FALSE       TRUE 
0.06904401 0.93095599 

新的模型只有6.9%的错误率,比之前的模型大大提高。当然我们还可以继续修改kernel以及`c`这个约束参数的cost来优化模型。
cost参数指定了我们违反margin的成本,cost小那么margin就更宽,就会有更多的支持向量位于margin或者违反margin,而cost越大那么就会有更少的支持向量位于margin或者违反margin

0 0