tidyr之gather、separate、spread…
来源:互联网 发布:linux dd 备份u盘 编辑:程序博客网 时间:2024/06/05 03:41
使用tidyr的时候也有可能需要dplyr包。
gather的用法:
factor_key = FALSE)
data为要处理的数据,key为新创建的变量,value为创建变量下的数值。...为列的设定,比如下面的例子创建变量stock,而不需要把time变成变量下的数值,则用X:Z或-time。
stocks <- data.frame( time = as.Date("2009-01-01") + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) )> stocks time X Y Z1 2009-01-01 -1.06024371 -1.3799111 1.20763492 2009-01-02 -0.49010731 -2.4870899 3.78885503 2009-01-03 1.15709245 -1.5708653 2.21644644 2009-01-04 -0.52049101 1.9813593 2.00944765 2009-01-05 0.60130291 -2.6300865 -3.96817986 2009-01-06 -0.09306031 -0.8420800 0.67044267 2009-01-07 -0.83323255 -1.4147638 2.33985048 2009-01-08 -0.15517432 1.8773903 1.64376309 2009-01-09 0.36519166 0.3696238 1.574658810 2009-01-10 0.09442596 -0.5840472 -1.0298875
stocks %>% gather(stock, price, -time) time stock price1 2009-01-01 X -1.060243712 2009-01-02 X -0.490107313 2009-01-03 X 1.157092454 2009-01-04 X -0.520491015 2009-01-05 X 0.601302916 2009-01-06 X -0.093060317 2009-01-07 X -0.833232558 2009-01-08 X -0.155174329 2009-01-09 X 0.3651916610 2009-01-10 X 0.0944259611 2009-01-01 Y -1.3799111012 2009-01-02 Y -2.4870899313 2009-01-03 Y -1.5708653314 2009-01-04 Y 1.9813593515 2009-01-05 Y -2.6300864816 2009-01-06 Y -0.8420800317 2009-01-07 Y -1.4147638518 2009-01-08 Y 1.8773902919 2009-01-09 Y 0.3696238220 2009-01-10 Y -0.5840472021 2009-01-01 Z 1.2076349222 2009-01-02 Z 3.7888550023 2009-01-03 Z 2.2164463824 2009-01-04 Z 2.0094476025 2009-01-05 Z -3.9681797826 2009-01-06 Z 0.6704426227 2009-01-07 Z 2.3398503928 2009-01-08 Z 1.6437630029 2009-01-09 Z 1.5746587530 2009-01-10 Z -1.02988745
gather_的用法如上,只是更适合于编程用。
separate的用法:
separate(data, col, into, sep = "[^[:alnum:]]+", remove =TRUE,
convert = FALSE, extra = "warn", fill = "warn", ...)
data为要处理的数据;col为要选择的变量名;into为要分割为几个变量的名;sep为正则表达式,用于分割;extra为在字符串过多的情况下处理;fill为在字符串过少的情况下处理。
> df <- data.frame(x = c(NA, "a.b", "a.d","b.c"))> df %>% separate(x, c("A", "B")) A B1 2 a b3 a d4 b c> df <- data.frame(x = c("a", "a b", "a b c", NA))> df x1 a2 a b3 a b c4
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left") a b1 a2 a b3 a b c4
> df <- data.frame(x = c("x: 123", "y: error: 7"))> df x1 x: 1232 y: error: 7> df %>% separate(x, c("key", "value"), sep = ":", extra = "merge") key value1 x 1232 y error: 7
separate_的用法如上,但更适用于编程。
separate_rows的用法:
separate_rows(data, ..., sep = "[^[:alnum:].]+", convert =FALSE)
> df <- data.frame(+ x = 1:3,+ y = c("a","d,e,f","g,h"),+ z = c("1","2,3,4","5,6"),+ stringAsFactors =F+ )> separate_rows(df, y, z, convert = T) x stringAsFactors y z1 1 FALSE a 12 2 FALSE d 23 2 FALSE e 34 2 FALSE f 45 3 FALSE g 56 3 FALSE h 6
而使用separate_rows(df, c("y","z"), convert =T)是不对的,必须使用separate_rows_ 才可。
spread的用法:
spread(data, key, value, fill = NA, convert = FALSE, drop =TRUE,
sep = NULL)
data为要扩展的数据,key为列的名字,其值将用于列表的表头, value将填充列表的值;fill为填充数据的缺失值。
使用上面的stocks例子
> stocksm <- stocks %>% gather(stock, price, -time)> stocksm %>% spread(stock, price) time X Y Z1 2009-01-01 -1.04456577 -0.97179381 2.43301052 2009-01-02 -0.09717072 -0.77313257 -2.34437363 2009-01-03 1.36933739 1.54137383 0.65513254 2009-01-04 2.13655070 -0.05889974 1.99883065 2009-01-05 0.39546822 3.80644394 3.71665466 2009-01-06 0.09720381 -0.44658971 0.51364717 2009-01-07 -0.50775134 -2.53712365 0.20048358 2009-01-08 -1.65134456 1.01235639 10.43771419 2009-01-09 -1.62622446 -0.80074087 3.044451510 2009-01-10 -0.37272122 3.86510320 4.7424977> stocksm %>% spread(time, price) stock 2009-01-01 2009-01-02 2009-01-03 2009-01-04 2009-01-051 X -1.0445658 -0.09717072 1.3693374 2.13655070 0.39546822 Y -0.9717938 -0.77313257 1.5413738 -0.05889974 3.80644393 Z 2.4330105 -2.34437356 0.6551325 1.99883061 3.7166546 2009-01-06 2009-01-07 2009-01-08 2009-01-09 2009-01-101 0.09720381 -0.5077513 -1.651345 -1.6262245 -0.37272122 -0.44658971 -2.5371237 1.012356 -0.8007409 3.86510323 0.51364707 0.2004835 10.437714 3.0444515 4.7424977
spread_用于编程。
unite的用法:
unite(data, col, ..., sep = "_", remove = TRUE)
data为数据;col为创建的变量;...为要合并的列名;sep为要为合并的变量添加的符号。
> mtcars %>% unite(vs_am, vs, am) mpg cyl disp hp drat wt qsec vs_am gearMazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0_1 4Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0_1 4Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1_1 4Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1_0 3Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0_0 3Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1_0 3
也可用unite_:
> mtcars %>% unite_("vs_am", c("vs","am")) mpg cyl disp hp drat wt qsec vs_am gearMazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0_1 4Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0_1 4Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1_1 4Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1_0 3Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0_0 3Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1_0 3
- tidyr之gather、separate、spread…
- javaNIO之Scatter/Gather
- NIO之Scatter/Gather
- Introducing tidyr
- Spread
- Spread
- 电子表格“Farpoint spread”之简介
- Boost 学习之算法篇 gather
- NIO边看边记 之 Scatter/Gather(四)
- Java NIO 之 Scatter 和gather
- Java的NIO之Scatter和Gather
- MPI之聚合通信-Scatter,Gather,Allgather
- ES6之Spread Operater拷贝对象
- 碰撞检测之分离轴测试(Separate Axis Testing)
- Building Maintainable Software-java篇之Separate Concerns in Modules
- 4.Java NIO系列教程之Scatter/Gather
- 四、Nio之Scatter(分散)/Gather(聚集)
- LaTex使用技巧(8):巧用gather排版…
- 文本挖掘——词云图的操作
- 懒惰学习——使用近邻分类(KNN)
- 朴素贝叶斯分类
- cocos2d-android_02_熟悉基本的类
- 向量自回归与结构向量误差修正模型
- tidyr之gather、separate、spread…
- 博为峰Java技术文章 ——JavaSE Swing JTabbedPane选项卡面板I
- 图论
- dplyr包介绍
- stringr包介绍
- [Educational Codeforces Round 17 F (762F)] Tree nesting
- poj 2965 枚举+DFS
- windows使用sublime text3编译lua
- java设计模式之命令模式