Data Analytics for Beginners:第三节

来源:互联网 发布:王思聪双性恋 知乎 编辑:程序博客网 时间:2024/06/06 00:59

RCode清洗testData
使用第二节的trainData覆盖第一节的testData,复制其代码。

PassengerId = testData[1]testData = testData[-c(1, 8:11)]testData$Sex = gsub("female", 1, testData$Sex)testData$Sex = gsub("^male", 0, testData$Sex)test_master_vector = grep("Master.",testData$Name)test_miss_vector = grep("Miss.", testData$Name)test_mrs_vector = grep("Mrs.", testData$Name)test_mr_vector = grep("Mr.", testData$Name)test_dr_vector = grep("Dr.", testData$Name)for(i in test_master_vector) {  testData[i, 2] = "Master"}for(i in test_miss_vector) {  testData[i, 2] = "Miss"}for(i in test_mrs_vector) {  testData[i, 2] = "Mrs"}for(i in test_mr_vector) {  testData[i, 2] = "Mr"}for(i in test_dr_vector) {  testData[i, 2] = "Dr"}test_master_age = round(mean(testData$Age[testData$Name == "Master"], na.rm = TRUE), digits = 2)test_miss_age = round(mean(testData$Age[testData$Name == "Miss"], na.rm = TRUE), digits =2)test_mrs_age = round(mean(testData$Age[testData$Name == "Mrs"], na.rm = TRUE), digits = 2)test_mr_age = round(mean(testData$Age[testData$Name == "Mr"], na.rm = TRUE), digits = 2)test_dr_age = round(mean(testData$Age[testData$Name == "Dr"], na.rm = TRUE), digits = 2)for (i in 1:nrow(testData)) {  if (is.na(testData[i,4])) {    if (testData[i, 2] == "Master") {      testData[i, 4] = test_master_age    } else if (testData[i, 2] == "Miss") {      testData[i, 4] = test_miss_age    } else if (testData[i, 2] == "Mrs") {      testData[i, 4] = test_mrs_age    } else if (testData[i, 2] == "Mr") {      testData[i, 4] = test_mr_age    } else if (testData[i, 2] == "Dr") {      testData[i, 4] = test_dr_age    } else {      print(paste("Uncaught title at: ", i, sep=""))      print(paste("The title unrecognized was: ", testData[i,2], sep=""))    }  }}#We do a manual replacement here, because we weren't able to programmatically figure out the title.#We figured out it was 89 because the above print statement should have warned us.testData[89, 4] = test_miss_agefor (i in 1:nrow(testData)) {  if (testData[i, 4] <= 12) {    testData[i, 7] = 1  } else {    testData[i, 7] = 1  }}for(i in 1:nrow(testData)) {  testData[i, 8] = testData[i, 5] + testData[i, 6] + 1}for(i in 1:nrow(testData)) {  if(testData[i, 2] == "Mrs" & testData[i, 6] > 0) {    testData[i, 9] = 1  } else {    testData[i, 9] = 2  }}

有几个函数的官方解释:
gerp,grepl...,search for matches to argument pattern within each element of a character vector.
gsub and sub perform replacement of all matches and the first respectively.
round 取有效整数位

原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 家里的白墙脏了怎么办 月经弄到床垫上怎么办 出租房墙面太脏怎么办 苹果6镜头模糊了怎么办 6s前摄像头进灰怎么办 手机镜头进灰了怎么办 6s摄像头进灰了怎么办 7plus摄像头进灰怎么办 苹果喇叭进灰了怎么办 苹果7摄像头进灰怎么办 锁眼里胶水堵了 怎么办 锁眼被牙签堵了怎么办 锁孔被胶水堵了怎么办 快手上不了同城怎么办 昌珉入伍宋茜怎么办 嗓子里卡了鱼刺怎么办 在餐厅吃到虫子怎么办 在餐厅吃出虫子怎么办 孕妇被虫子咬了怎么办 吃外卖吃到虫子怎么办 杯子盖拧错位了怎么办 身边有吸毒的人怎么办 如果牛难产了怎么办要 牛难产拉不出来怎么办 老公发现老婆有外遇怎么办 睡眠不好半夜老是醒怎么办 拔完智齿肿了怎么办 拔牙后咽口水疼怎么办 吃了脏东西拉肚子怎么办 微信遇到仙人跳怎么办 牙有裂痕疼应该怎么办 胸罩在学校掉了怎么办 锁屏密码忘记了怎么办 中汇支付不到账怎么办 痘痘毁容烂脸怎么办 我的手机掉了怎么办 公司不给开工资怎么办 我有卵巢老化怎么办呢 老师骂了我,我该怎么办 我的牙齿很难看怎么办 门牙摔了个缺怎么办