R语言h2o深度学习分类

来源:互联网 发布:苹果能看小说的软件 编辑:程序博客网 时间:2024/04/24 12:33
配置所需环境:
install.packages("h2o")
library(h2o)
Sys.setenv(JAVA_HOME="E:/java/JAVA(1)")         -----配置环境变量
h2o.init() #链接h2o平台
下载数据:
训练集: http://www.pjreddie.com/media/files/mnist_train.csv
测试集: http://www.pjreddie.com/media/files/mnist_test.csv
train_h2o <- h2o.importFile( path = "D:/mnist_train.csv")
test_h2o <- h2o.importFile(path = "D:/mnist_test.csv")
y_train <- as.factor(as.matrix(train_h2o[, 1]))
y_test <- as.factor(as.matrix(test_h2o[, 1]))
训练模型:
model <- h2o.deeplearning(x = 2:785,  # column numbers for predictors
                          y = 1,   # column number for label
                          training_frame = train_h2o, 训练集
                          activation = "Tanh", #激活函数
                          #balance_classes = TRUE, #训练集类别均衡
                          hidden = c(100, 100, 100),  ## three hidden layers
                          epochs = 100) #迭代100次
由于数据规模比较大,是60000行*785列的,所以这个过程中电脑会变的巨卡,cpu使用量会持续95%以上,我的电脑持续了40分钟才训练完模型
接下来你可以输出模型来看看你的训练集的训练效果如何

model


Model Details:
==============


H2ORegressionModel: deeplearning
Model ID:  DeepLearning_model_R_1500974326986_4 
Status of Neuron Layers: predicting C1, regression, gaussian distribution, Quadratic loss, 92,101 weights/biases, 1.1 MB, 862,830 training samples, mini-batch size 1
  layer units   type dropout       l1       l2 mean_rate rate_rms momentum
1     1   717  Input  0.00 %                                              
2     2   100   Tanh  0.00 % 0.000000 0.000000  0.352263 0.377816 0.000000
3     3   100   Tanh  0.00 % 0.000000 0.000000  0.050956 0.026576 0.000000
4     4   100   Tanh  0.00 % 0.000000 0.000000  0.233008 0.247813 0.000000
5     5     1 Linear         0.000000 0.000000  0.001606 0.001025 0.000000
  mean_weight weight_rms mean_bias bias_rms
1                                          
2   -0.002465   0.110346  0.016357 0.192539
3    0.001666   0.177409  0.002860 0.447464
4   -0.002143   0.154353 -0.017609 0.236047
5   -0.012989   0.069333 -0.056454 0.000000




H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 10092 samples **


MSE:  0.1165795
RMSE:  0.3414374
MAE:  0.1600576
RMSLE:  0.09332472
Mean Residual Deviance :  0.1165795


然后来看一下测试集分类效果如何,我们把训练的模型拿来预测测试集:
yhat_train <- h2o.predict(model, train_h2o)$predict
yhat_train <- as.factor(as.matrix(yhat_train))


yhat_test <- h2o.predict(model, test_h2o)$predict

yhat_test <- as.factor(as.matrix(yhat_test))


yt<-as.numeric(as.character(y_test)) #将因子现转字符再转数值
yhat<-as.numeric(as.character(yhat_test))

执行以下代码可以输出分类的正确个数

s<-0
for(i in 1:10000)
{
if(yt[i]==round(yhat[i]))
s<-s+1
}
s

[1] 8964

预测成功8964个,正确率为89.64%,效果还算不错

原创粉丝点击