R 速学之篇一

来源:互联网 发布:淘宝联盟如何做推广 编辑:程序博客网 时间:2024/06/06 09:55
地址:http://www.statmethods.net/index.html
引言:
适合有一点统计软件基础的Rer,提供一个roadmap and code necessary。


sitemap-索引
一、数据导入
二、基本统计
三、基本图形
四、R交互
五、数据管理
六、高级统计
七、高级图形


一、数据导入
(类似于SAS中proc iml)
(一)数据类型:
scalars, vectors (numerical, character, logical), matrices, arrays, data frames, and lists
标量、向量、矩阵、数组、数据框架、列表
1、向量vectors:c(2,4)
2、矩阵matrices:matrix(cells, nrow=2, ncol=2, byrow=TRUE,dimnames=list(rnames, cnames))
3、数组arrays:与矩阵相似,但维度可以多于两个,如多维数组
4、数据框架data frames:与矩阵相似,但允许元素有不同的变量类型和变量长度(与SAS、SPSS一样)
d <- c(1,2,3,4)
e <- c("red", "white", "red", NA)
f <- c(TRUE,TRUE,TRUE,FALSE)
mydata <- data.frame(d,e,f)
names(mydata) <- c("ID","Color","Passed") # variable names
5、列表Lists:将一些变量放入列表(可能并不相关)
w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)
v <- c(list1,list2)
调用,同向量或矩阵调用,用位置和名字调用:
mylist[[2]] # 2nd component of the list
mylist[["mynumbers"]] # component named mynumbers in list
6、因素Factors:将数字变量变为分类变量。
gender <- factor(gender) 
rating <- ordered(rating) #等级变量
7、此部分常用函数:
length(object) # number of elements or components
str(object)    # structure of an object 
class(object)  # class or type of an object
names(object)  # names
c(object,object,...)       # combine objects into a vector
cbind(object, object, ...) # combine objects as columns
rbind(object, object, ...) # combine objects as rows 
object     # prints the object
ls()       # list current objects
rm(object) # delete an object
newobject <- edit(object) # edit copy and save as newobject 
fix(object)               # edit in place


(二)导入数据:
1、文本:
mydata <- read.table("c:/mydata.csv", header=TRUE, 
  sep=",", row.names="id")
2、excel:
library(xlsx)
mydata <- read.xlsx("c:/myexcel.xlsx", 1) #第一行包括变量名
mydata <- read.xlsx("c:/myexcel.xlsx", sheetName = "mysheet")
3、SPSS:
#格式转存
get file='c:\mydata.sav'.
export outfile='c:\mydata.por'. 
# in R 
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors
4、SAS:
# save SAS dataset in trasport format
libname out xport 'c:/mydata.xpt';
data out.mydata;
set sasuser.mydata;
run;
# in R 
library(Hmisc)
mydata <- sasxport.get("c:/mydata.xpt")
# character variables are converted to R factors
5、STATA:
# input Stata file
library(foreign)
mydata <- read.dta("c:/mydata.dta")
6、systat:
# input Systat file
library(foreign)
mydata <- read.systat("c:/mydata.dta")
7、ODBC
# RODBC Example
# import 2 tables (Crime and Punishment) from a DBMS
# into R data frames (and call them crimedat and pundat)
library(RODBC)
myconn <-odbcConnect("mydsn", uid="Rob", pwd="aardvark")
crimedat <- sqlFetch(myconn, "Crime")
pundat <- sqlQuery(myconn, "select * from Punishment")
close(myconn)


(三)浏览数据:
ls()
names(mydata)
str(mydata) # structure
levels(mydata$v1)
dim(object)
class(object)
mydata # print mydata 
head(mydata, n=10) #前10个
tail(mydata, n=5) #后5个


(四)变量标签:
library(Hmisc)
label(mydata$myvar) <- "Variable label for variable myvar" 
describe(mydata)
names(mydata)[3] <- "This is the label for variable 3"
mydata[3] # list the variable
(五)变量值标签:
mydata$v1 <- factor(mydata$v1,
levels = c(1,2,3),
labels = c("red", "blue", "green")
(六)缺失值Missing Data:
1、查看变量是否有缺失:
is.na(x) # returns TRUE of x is missing
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T)
2、将变量编码为缺失:
mydata$v1[mydata$v1==99] <- NA #大写NA
3、将变量缺失值排除进行计算:
x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2
4、查看某被试数据是否完整:
mydata[!complete.cases(mydata),]
5、将缺失数据变成完整数据(行列成对删除listwise):
newdata <- na.omit(mydata)
6、日期变量:
mydates <- as.Date(c("2007-06-22", "2004-02-13")) # as.Date()将输入变量变为日期变量
days <- mydates[1] - mydates[2] # 计算日期间隔天数
today <- Sys.Date() #读取系统日期
format(today, format="%B %d %Y") # 指定变量格式 非省略(省略用小写)字母月、数字日、数字年
"June 20 2007" # %m指数字月
strDates <- as.character(dates) # 也可用as.character变成字符型变量
0 0
原创粉丝点击