Getting and Cleaning Data - Week 1 Quiz

来源:互联网 发布:8051 单片机有哪些 编辑:程序博客网 时间:2024/06/08 17:57

Coursera 公开课 Quiz

第一题

题目

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

How many properties are worth $1,000,000 or more?


solution

从这个网站下载一个关于房价的表格
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"download.file(fileUrl, destfile = "data.csv", method = "curl")dateDownloaded <- date();

关于这组数据的详细解释可以参考这个pdf
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

问有多少房屋价格大于100万美元,根据pdf里面的解释,val>= 24 代表房价大于100万美元

hd <- read.csv("data.csv")sum(!is.na(hd[hd$VAL >= 24, 37]))

结果

[ 1 ] 53

第二题

题目

Use the data you loaded from Question 1. Consider the variable FES in the code book. Which of the “tidy data” principles does this variable violate?

solution

tidy data 一列不能有多个变量

第三题

题目

Download the Excel spreadsheet on Natural Gas Aquisition Program here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx

Read rows 18-23 and columns 7-15 into R and assign the result to a variable called:

dat

What is the value of:

sum(dat$Zip*dat$Ext,na.rm=T)

original data source: http://catalog.data.gov/dataset/natural-gas-acquisition-program)

solution

下载这个表格

fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx"download.file(fileUrl, destfile = "gas.xlsx")dateDownloaded <- date()

读取 row 18-23 col 7-15 的数据,赋值给dat

library(xlsx)rowIndex = 18:23colIndex = 7:15dat <- read.xlsx("gas.xlsx", sheetIndex = 1, rowIndex = rowIndex, colIndex = colIndex, header = TRUE)

计算题目给出的式子

sum(dat$Zip * dat$Ext, na.rm=T)

结果

36534720

第四题

题目

Question 4

Read the XML data on Baltimore restaurants from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

How many restaurants have zipcode 21231?

solution

代码

fileUrl <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"doc <- xmlTreeParse(fileUrl, useInternal = TRUE)rootNode <- xmlRoot(doc)sum(xpathSApply(rootNode, "//zipcode", xmlValue) == "21231")

结果

127

第五题

题目

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv

using the fread() command load the data into an R object

0 0
原创粉丝点击