kaggle bike sharing program R code tranlated from python code ranked 9th
来源:互联网 发布:网络流行词汇 编辑:程序博客网 时间:2024/05/22 14:18
Use Caret to do the feature selection for rf.
train_row <- read.csv('E:/kuaipan/Kaggle Project/Bike Sharing Demand/train.csv')test_row <- read.csv('e:/kuaipan/Kaggle Project/Bike Sharing Demand/test.csv')train_row$type = 'train'test_row$type = 'test'colnames(train_row)test_row$casual=NAtest_row$registered=NAtest_row$count=NArow_df = rbind(train_row, test_row)row_df_backup = row_dffor(col in c('casual','registered','count')){ newcol = paste(col,'_log',sep="") row_df[newcol] = log(row_df[col] + 1)}if(!require('lubridate')){ install.packages(lubridate)}row_df$dt = ymd_hms(row_df$datetime)row_df$day = day(row_df$dt)row_df$month = month(row_df$dt)row_df$year = year(row_df$dt)row_df$hour = hour(row_df$dt)row_df$weekday =wday(row_df$dt)row_df$week = week(row_df$dt)for(s in c(1,2,3,4)){ row_df[which(row_df$season==s & row_df$type=='train'),'season_count']=sum(row_df[which(row_df$season==s & row_df$type=='train'),'count'])}#head(row_df)SetWorkingDay = function(year,month,day,value){ row_df[which(row_df$year %in% year & row_df$month %in% month & row_df$day %in% day),'workingday'] = value if(value == 1){ b = 0 }else{ b = 1 } row_df[which(row_df$year %in% year & row_df$month %in% month & row_df$day %in% day),'holiday'] = b }SetWorkingDay(2011,4,15,1)SetWorkingDay(2012,4,16,1)SetWorkingDay(2011,11,25,0)SetWorkingDay(2012,11,23,0)row_df[which(row_df$year == 2011 &row_df$month ==11 & row_df$day ==25),'holiday']=1row_df[which(row_df$year == 2012 &row_df$month ==11 & row_df$day ==23),'holiday']=1row_df[which(row_df$year == 2012 &row_df$month ==5 & row_df$day ==21),'holiday']=1row_df[which(row_df$year == 2012 &row_df$month ==6 & row_df$day ==1),'holiday']=1row_df[which(row_df$year == 2012 &row_df$month ==10 & row_df$day ==30),'holiday']=1row_df[which(row_df$month ==12 & row_df$day %in% c(24,26,31)),'holiday']=1row_df[which(row_df$month ==12 & row_df$day %in% c(24,31)),'workingday']=1row_df[which(row_df$workingday==1 & row_df$hour %in% c(8,17,18,12)),'peak']=1row_df[which(row_df$workingday==0 & 10<=row_df$hour<=19),'peak']=1row_df[which(is.na(row_df$peak)),'peak']=0row_df[which(row_df$temp>27 & row_df$windspeed <30),'ideal']=1row_df[which(is.na(row_df$ideal)),'ideal']=0row_df[which(row_df$humidity>=60&row_df$workingday==1),'sticky']=1row_df[which(is.na(row_df$sticky)),'sticky']=0row_df.train = row_df[which(row_df$type == 'train'),]row_df.test = row_df[which(row_df$type == 'test'),]library(Metrics)get_rmsle =function(pred, actual){ rs = rmsle(log(pred+1),log(actual+1)) sqrt((exp(rs)))}library(caret)#use ten-fold cross validation control=rfeControl(functions=rfFuncs, method="cv", number=10)row_df.features = row_df.train[,-c(10,11,12,14,15,16,1,17,13)]#Feature dataset and result dataset must be same dataset#By default the rfe will add a subset contain all the features.result=rfe(row_df.train[,-c(10,11,12,14,15,16,1,17,13)],row_df.train[,14],size=c(16:17),rfeControl = control) plot(result, type=c('p','l'))#the final rf model is result$fit
0 0
- kaggle bike sharing program R code tranlated from python code ranked 9th
- Learn From Code--Kaggle
- Bike Sharing Demand (Kaggle)
- Kaggle Bike Sharing项目报告
- 【Kaggle笔记】Bike Sharing Demand
- kaggle 入门 Bike sharing Demand prediction
- kaggle——Bike Sharing Demand
- kaggle入门-Bike Sharing Demand自行车需求预测
- Kaggle (Bike Sharing Demand)top20%
- code program
- kaggle Code : Titanic: Machine Learning from Disaster 分类
- [Source Code] My First Python Program
- Sharing Your Code
- docker sharing -- code
- python sharing your code windows下分享你的代码
- js.program code
- Hadoop example program code
- MapReduce program example code
- hdu1316
- MVP和MVC模式解读与实战运用(一)
- Linux中环境变量文件及配置
- 关于tomcat的多域名配置
- [CQOI2012]组装 (非贪心,数学解法)
- kaggle bike sharing program R code tranlated from python code ranked 9th
- Spark源码阅读笔记之Broadcast(二)
- 设计模式:25 世界需要和平_中介者模式
- pymongo 使用
- 通过boost::bind实现c++多线程
- java基础知识实例
- c++ 变量的作用域
- 一切成功源于积累——20150813 复利效应 每天进步一点点到底指的是什么?
- STM32 调试步进电机初级篇