机器学习与R之KNN
来源:互联网 发布:zeppelin源码下载 编辑:程序博客网 时间:2024/06/05 22:31
k近邻法与kd树(与本文基本无关)
为了提高k近邻搜索的效率,可以考虑使用特殊的结构存储训练数据,以减少计算距离的次数。具体方法有很多,这里介绍kd树方法
参考http://blog.csdn.net/qll125596718/article/details/8426458
为了提高k近邻搜索的效率,可以考虑使用特殊的结构存储训练数据,以减少计算距离的次数。具体方法有很多,这里介绍kd树方法
参考http://blog.csdn.net/qll125596718/article/details/8426458
R语言KNN实现
library(class)
knn(train,test ,laber,k)
wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21)
字符变量利用哑变量编码,eg:0/1
Python版实现
http://blog.csdn.net/q383700092/article/details/51757762
R语言版调用函数
http://blog.csdn.net/q383700092/article/details/51759313
MapReduce简化实现版
http://blog.csdn.net/q383700092/article/details/51780865
spark版
后续添加
rm(list=ls())# import the CSV filewbcd <- read.csv("wisc_bc_data.csv", stringsAsFactors = FALSE)# examine the structure of the wbcd data framestr(wbcd)# drop the id featurewbcd <- wbcd[-1]# table of diagnosistable(wbcd$diagnosis)# recode diagnosis as a factorwbcd$diagnosis <- factor(wbcd$diagnosis, levels = c("B", "M"), labels = c("Benign", "Malignant"))# table or proportions with more informative labelsround(prop.table(table(wbcd$diagnosis)) * 100, digits = 1)# summarize three numeric featuressummary(wbcd[c("radius_mean", "area_mean", "smoothness_mean")])# create normalization function最大最小归一化normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x)))}# test normalization function - result should be identicalnormalize(c(1, 2, 3, 4, 5))normalize(c(10, 20, 30, 40, 50))# normalize the wbcd data |lapply把函数应用到列表的每一个元素wbcd_n <- as.data.frame(lapply(wbcd[2:31], normalize))# confirm that normalization workedsummary(wbcd_n$area_mean)# create training and test datawbcd_train <- wbcd_n[1:469, ]wbcd_test <- wbcd_n[470:569, ]# create labels for training and test datawbcd_train_labels <- wbcd[1:469, 1]wbcd_test_labels <- wbcd[470:569, 1]## Step 3: Training a model on the data ----# load the "class" librarylibrary(class)wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21)## Step 4: Evaluating model performance ----评估性能# load the "gmodels" librarylibrary(gmodels)# Create the cross tabulation of predicted vs. actualCrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq = FALSE)## Step 5: Improving model performance ----提高模型性能# use the scale() function to z-score standardize a data frame scale()z分数归一化wbcd_z <- as.data.frame(scale(wbcd[-1]))# confirm that the transformation was applied correctlysummary(wbcd_z$area_mean)# create training and test datasetswbcd_train <- wbcd_z[1:469, ]wbcd_test <- wbcd_z[470:569, ]# re-classify test caseswbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21)# Create the cross tabulation of predicted vs. actualCrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq = FALSE)# try several different values of kwbcd_train <- wbcd_n[1:469, ]wbcd_test <- wbcd_n[470:569, ]wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=1)CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=5)CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=11)CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=15)CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=21)CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=27)CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)
0 0
- 机器学习与R之KNN
- 机器学习与智能优化 之 KNN
- 机器学习之KNN
- 机器学习之KNN
- 机器学习之knn
- R机器学习之一:kNN算法案例
- 机器学习之kNN算法
- 机器学习之KNN 算法
- 机器学习之KNN算法
- 机器学习之knn实现
- 机器学习之KNN算法
- 机器学习实战之kNN
- 机器学习之kNN算法
- 机器学习python之KNN
- 机器学习之knn算法
- 机器学习之KNN 算法
- 机器学习实战之kNN
- 机器学习之KNN算法
- 第一次打字练习
- 人生的意义
- C语言好题&错题笔记
- 《谈修养》
- JBorder组件边框
- 机器学习与R之KNN
- OkHttp的简单使用
- github命令
- JTable 为单元格添加按钮效果和事件效果
- C#依赖注入实例解析
- Linux 内核空间虚拟地址和物理地址相互转换
- easyui datagrid综合功能demo实现
- maven+springmvc+mybatis项目从properties读数据库连接属性时拿不到连接,的研究与解决方案
- python-scipy 图像处理