R语言h2o深度学习分类

来源：互联网发布：苹果能看小说的软件编辑：程序博客网时间：2024/04/24 12:33

配置所需环境：
install.packages("h2o")
library(h2o)
Sys.setenv(JAVA_HOME="E:/java/JAVA(1)") -----配置环境变量
h2o.init() #链接h2o平台
下载数据：
训练集: http://www.pjreddie.com/media/files/mnist_train.csv
测试集: http://www.pjreddie.com/media/files/mnist_test.csv
train_h2o <- h2o.importFile( path = "D:/mnist_train.csv")
test_h2o <- h2o.importFile(path = "D:/mnist_test.csv")
y_train <- as.factor(as.matrix(train_h2o[, 1]))
y_test <- as.factor(as.matrix(test_h2o[, 1]))
训练模型：
model <- h2o.deeplearning(x = 2:785, # column numbers for predictors
y = 1, # column number for label
training_frame = train_h2o, 训练集
activation = "Tanh", #激活函数
#balance_classes = TRUE, #训练集类别均衡
hidden = c(100, 100, 100), ## three hidden layers
epochs = 100) #迭代100次
由于数据规模比较大，是60000行*785列的，所以这个过程中电脑会变的巨卡，cpu使用量会持续95%以上，我的电脑持续了40分钟才训练完模型
接下来你可以输出模型来看看你的训练集的训练效果如何

model

Model Details:
==============

H2ORegressionModel: deeplearning
Model ID: DeepLearning_model_R_1500974326986_4
Status of Neuron Layers: predicting C1, regression, gaussian distribution, Quadratic loss, 92,101 weights/biases, 1.1 MB, 862,830 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum
1 1 717 Input 0.00 %
2 2 100 Tanh 0.00 % 0.000000 0.000000 0.352263 0.377816 0.000000
3 3 100 Tanh 0.00 % 0.000000 0.000000 0.050956 0.026576 0.000000
4 4 100 Tanh 0.00 % 0.000000 0.000000 0.233008 0.247813 0.000000
5 5 1 Linear 0.000000 0.000000 0.001606 0.001025 0.000000
mean_weight weight_rms mean_bias bias_rms
1
2 -0.002465 0.110346 0.016357 0.192539
3 0.001666 0.177409 0.002860 0.447464
4 -0.002143 0.154353 -0.017609 0.236047
5 -0.012989 0.069333 -0.056454 0.000000

H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 10092 samples **

MSE: 0.1165795
RMSE: 0.3414374
MAE: 0.1600576
RMSLE: 0.09332472
Mean Residual Deviance : 0.1165795

然后来看一下测试集分类效果如何，我们把训练的模型拿来预测测试集：
yhat_train <- h2o.predict(model, train_h2o)$predict
yhat_train <- as.factor(as.matrix(yhat_train))

yhat_test <- h2o.predict(model, test_h2o)$predict

yhat_test <- as.factor(as.matrix(yhat_test))

yt<-as.numeric(as.character(y_test)) #将因子现转字符再转数值
yhat<-as.numeric(as.character(yhat_test))

执行以下代码可以输出分类的正确个数

s<-0
for(i in 1:10000)
{
if(yt[i]==round(yhat[i]))
s<-s+1
}
s

[1] 8964

预测成功8964个，正确率为89.64%，效果还算不错

阅读全文

0 0