Data Analytics for Beginners:第三节
来源:互联网 发布:王思聪双性恋 知乎 编辑:程序博客网 时间:2024/06/06 00:59
RCode清洗testData
使用第二节的trainData覆盖第一节的testData,复制其代码。
PassengerId = testData[1]testData = testData[-c(1, 8:11)]testData$Sex = gsub("female", 1, testData$Sex)testData$Sex = gsub("^male", 0, testData$Sex)test_master_vector = grep("Master.",testData$Name)test_miss_vector = grep("Miss.", testData$Name)test_mrs_vector = grep("Mrs.", testData$Name)test_mr_vector = grep("Mr.", testData$Name)test_dr_vector = grep("Dr.", testData$Name)for(i in test_master_vector) { testData[i, 2] = "Master"}for(i in test_miss_vector) { testData[i, 2] = "Miss"}for(i in test_mrs_vector) { testData[i, 2] = "Mrs"}for(i in test_mr_vector) { testData[i, 2] = "Mr"}for(i in test_dr_vector) { testData[i, 2] = "Dr"}test_master_age = round(mean(testData$Age[testData$Name == "Master"], na.rm = TRUE), digits = 2)test_miss_age = round(mean(testData$Age[testData$Name == "Miss"], na.rm = TRUE), digits =2)test_mrs_age = round(mean(testData$Age[testData$Name == "Mrs"], na.rm = TRUE), digits = 2)test_mr_age = round(mean(testData$Age[testData$Name == "Mr"], na.rm = TRUE), digits = 2)test_dr_age = round(mean(testData$Age[testData$Name == "Dr"], na.rm = TRUE), digits = 2)for (i in 1:nrow(testData)) { if (is.na(testData[i,4])) { if (testData[i, 2] == "Master") { testData[i, 4] = test_master_age } else if (testData[i, 2] == "Miss") { testData[i, 4] = test_miss_age } else if (testData[i, 2] == "Mrs") { testData[i, 4] = test_mrs_age } else if (testData[i, 2] == "Mr") { testData[i, 4] = test_mr_age } else if (testData[i, 2] == "Dr") { testData[i, 4] = test_dr_age } else { print(paste("Uncaught title at: ", i, sep="")) print(paste("The title unrecognized was: ", testData[i,2], sep="")) } }}#We do a manual replacement here, because we weren't able to programmatically figure out the title.#We figured out it was 89 because the above print statement should have warned us.testData[89, 4] = test_miss_agefor (i in 1:nrow(testData)) { if (testData[i, 4] <= 12) { testData[i, 7] = 1 } else { testData[i, 7] = 1 }}for(i in 1:nrow(testData)) { testData[i, 8] = testData[i, 5] + testData[i, 6] + 1}for(i in 1:nrow(testData)) { if(testData[i, 2] == "Mrs" & testData[i, 6] > 0) { testData[i, 9] = 1 } else { testData[i, 9] = 2 }}
有几个函数的官方解释:
gerp,grepl...,search for matches to argument pattern within each element of a character vector.
gsub and sub perform replacement of all matches and the first respectively.
round 取有效整数位
阅读全文
0 0
- Data Analytics for Beginners:第三节
- Data Analytics for Beginners:第一节
- Data Analytics for Beginners:第二节
- data analytics
- books for beginners in data mining and pattern recgonition
- Real Time Analytics for Big Data: An Alternative Approach
- BigBench: Toward An Industry-Standard Benchmark for Big Data Analytics
- 第七章 第三节 for 循环控制结构
- WinInet Tutorial for Beginners
- A* pathfinding for beginners
- JavaScript For Beginners
- JavaScript For Beginners
- javaServer pages for beginners
- A* Pathfinding for Beginners
- A* Pathfinding for Beginners
- HttpClient For Absolute Beginners
- A* Pathfinding for Beginners
- C# Tutorial For Beginners
- netty LengthFieldBasedFrameDecoder 使用实例
- asp.net Web项目中使用Log4Net进行错误日志记录
- sphinx使用整理文档
- 算法训练 数位分离
- HDOJ2143 box
- Data Analytics for Beginners:第三节
- c++
- js经典面试问题:如何让for循环中的setTimeout()函数像预想中一样工作?
- java的list集合的逆序与复制
- I am back !
- 西门子S7-200 CODE段
- 修改Material Design库的版本导致TabLayout无法使用的问题解决
- 第四章 朴素贝叶斯
- OS