R Programming -- Factors
来源:互联网 发布:经典译林怎么样 知乎 编辑:程序博客网 时间:2024/05/16 11:24
Factors
Often your data needs to be grouped by category: blood pressure by age range, accidents by auto manufacturer, and so forth. R has a special collection type called afactor to track these categorized values.
Creating Factors5.1
It's time to take inventory of the ship's hold. We'll make a vector for you with the type of booty in each chest.
To categorize the values, simply pass the vector to the
factor
function:RedoComplete> chests <- c('gold', 'silver', 'gems', 'gold', 'gems')> types <- factor(chests)
There are a couple differences between the original vector and the new factor that are worth noting. Print the
chests
vector:RedoComplete> print(chests)[1] "gold" "silver" "gems" "gold" "gems"
You see the raw list of strings, repeated values and all. Now print the
types
factor:RedoComplete> print(types)[1] gold silver gems gold gems Levels: gems gold silver
Printed at the bottom, you'll see the factor's "levels" - groups of unique values. Notice also that there are no quotes around the values. That's because they're not strings; they're actually integer references to one of the factor's levels.
Let's take a look at the underlying integers. Pass the factor to the
as.integer
function:RedoComplete> as.integer(types)[1] 2 3 1 2 1
You can get only the factor levels with the
levels
function:RedoComplete> levels(types)[1] "gems" "gold" "silver"
Plots With Factors5.2
You can use a factor to separate plots into categories. Let's graph our five chests by weight and value, and show their type as well. We'll create two vectors for you;
weights
will contain the weight of each chest, andprices
will track how much the chests are worth.Now, try calling
plot
to graph the chests by weight and value.RedoComplete> weights <- c(300, 200, 100, 250, 150)> prices <- c(9000, 5000, 12000, 7500, 18000)> plot(weights, prices)
- 100150200250300600080001000012000140001600018000weightsprices
We can't tell which chest is which, though. Fortunately, we can use different plot characters for each type by converting the factor to integers, and passing it to the
pch
argument ofplot
.RedoComplete> plot(weights, prices, pch=as.integer(types))
"Circle", "Triangle", and "Plus Sign" still aren't great descriptions for treasure, though. Let's add a legend to show what the symbols mean.
- 100150200250300600080001000012000140001600018000weightsprices
The
legend
function takes a location to draw in, a vector with label names, and a vector with numeric plot character IDs.RedoComplete> legend("topright", c("gems","gold","silver"),pch=1:3)
Next time the boat's taking on water, it would be wise to dump the silver and keep the gems!
- 100150200250300600080001000012000140001600018000weightspricesgemsgoldsilvergemsgoldsilver
If you hard-code the labels and plot characters, you'll have to update them every time you change the plot factor. Instead, it's better to derive them by using the
levels
function on your factor:RedoComplete> legend("topright",levels(types),pch=1:length(levels(types)))
- 100150200250300600080001000012000140001600018000weightspricesgemsgoldsilvergemsgoldsilver
Chapter 5 Completed
Share your plunder:
A long inland march has brought us to the end of Chapter 5. We've stumbled across another badge!
Factors help you divide your data into groups. In this chapter, we've shown you how to create them, and how to use them to make plots more readable.
More from O'Reilly
Did you know that our sponsor O'Reilly has some great resources for big data practitioners? Check out the Strata Newsletter, the Strata Blog, and get access to five e-books on big data topics from leading thinkers in the space.
Continue
- R Programming -- Factors
- Factors in R
- R Programming: Part 2 - Programming with R
- R Programming -- basic R expressions
- R Programming --vectors
- R programming -- Metrix
- R Programming -- Summary Statistics
- R Programming -- data frames
- R Programming Note 1
- R Programming Note 2
- R Programming Note 3
- R Programming Note 4
- R Programming Note 5
- R Programming week1-Subsetting
- R Programming Notes
- R programming(1)
- R Programming Assignment 1
- Programming over R
- Dlink DIR-615L 和 Mercury MW300R WDS 应用举例
- 微博开发平台SDK使用示例(iOS版)
- Apache Shiro 使用手册
- 防止头文件的重复包含和编译及重定义错误(ifndef/define/endif 用法)
- Java GUI编程(二)事件模型
- R Programming -- Factors
- 采用 C 标准库头文件的 C++ 版本(c++primer)
- object-c中的内存管理
- 用随机森林模型替代常用的回归和分类模型
- python3文本文件按行哈希
- maven pom.xml文件
- kvm的气泡(balloon)机制 及 在线资源调整
- NS3 回调 C++
- Pow(x, n)