ggplot2之Layer—— geoms（二）

来源：互联网发布：交易训练软件编辑：程序博客网时间：2024/05/17 12:48

上次谈到了layer中跟geoms_有关的参考线、条形图、2d bin计数的热图、空图、盒式图。这次谈一些其他的图形及参数。

1. geom_contour 、stat_contour —— 等高线

2. geom_count、 stat_sum—— 计数重叠点

3. geom_density_2d 、stat_density_—— 2d二维密度估计的等高线

4. geom_density 、stat_density—— 平滑密度估计

5. geom_dotplot—— 点图

6. geom_errorbarh—— 水平误差条

1. geom_contour、stat_contour —— 等高线

函数格式：

geom_contour(mapping = NULL, data = NULL, stat = "contour",  position = "identity", ..., lineend = "butt", linejoin = "round",  linemitre = 1, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)stat_contour(mapping = NULL, data = NULL, geom = "contour",  position = "identity", ..., na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)

数据样式：

这里写图片描述

基本图形：

library(ggplot2)library(easyGgplot2)v <- ggplot(faithfuld, aes(waiting, eruptions, z = density))v + geom_contour()

这里写图片描述

ggplot(faithful, aes(waiting, eruptions)) +  geom_density_2d()

这里写图片描述

通过bins设置等高线数量。

plot1 <- v + geom_contour(bins = 2)plot2 <- v + geom_contour(bins = 5)plot3 <- v + geom_contour(bins = 10)plot4 <- v + geom_contour(bins = 15)ggplot2.multiplot(plot1,plot2, plot3,plot4,cols=2)

这里写图片描述

通过binwidth设置等高线间的距离。

plot1 <- v + geom_contour(binwidth = 0.01)plot2 <- v + geom_contour(binwidth = 0.005)plot3 <- v + geom_contour(binwidth = 0.0025)plot4 <- v + geom_contour(binwidth = 0.00125)ggplot2.multiplot(plot1,plot2,plot3,plot4, cols=2)

这里写图片描述

其他的一些参数。

plot1 <- v + geom_contour(aes(colour = ..level..))plot2 <- v + geom_contour(colour = "red")plot3 <- v + geom_raster(aes(fill = density)) +  geom_contour(colour = "green")v1 <- ggplot(faithfuld, aes(waiting, eruptions, z = density,alpha=0.5))plot4 <- v1 + geom_raster(aes(fill = density)) +  geom_contour(colour = "white")ggplot2.multiplot(plot1,plot2,plot3,plot4, cols=2)

这里写图片描述

2. geom_count、 stat_sum—— 计数重叠点

geom_point用于计算每个位置的观察次数，然后将计数映射到点区域。当有离散数据和数据很多出现重复时很有用。
函数格式：

geom_count(mapping = NULL, data = NULL, stat = "sum",  position = "identity", ..., na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)stat_sum(mapping = NULL, data = NULL, geom = "point",  position = "identity", ..., na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)

散点图，看不出重复的。

ggplot(mpg, aes(cty, hwy)) +  geom_point()

这里写图片描述

重复的显示出，用点的大小表示重复多少。

ggplot(mpg, aes(cty, hwy)) +  geom_count()

这里写图片描述

最好与scale_size_area一起使用，这样以确保0的计数将被赋予0。在这里由于最小的计数已经接近0，所以没有太大的不同。

ggplot(mpg, aes(cty, hwy)) +  geom_count() +  scale_size_area()

这里写图片描述

显示比例而不是计数——默认情况下，绘图中的所有分类变量组成组。指定没有组标识的geom_count会导致一个无用的图：

d <- ggplot(diamonds, aes(x = cut, y = clarity))d + geom_count(aes(size = ..prop..))

这里写图片描述

为了纠正这个问题并达到一个更理想的图，我们需要指定哪个组的比例是要计算的。

d + geom_count(aes(size = ..prop.., group = 1)) +  scale_size_area(max_size = 10)

这里写图片描述

设置group参数的值为x或者y变量，使得行或者列的和为1。

d + geom_count(aes(size = ..prop.., group = cut)) +  scale_size_area(max_size = 10)

这里写图片描述

d + geom_count(aes(size = ..prop.., group = clarity)) +  scale_size_area(max_size = 10)

这里写图片描述

3. geom_density_2d 、stat_density_—— 2d二维密度估计的等高线

使用kde2d执行二维核密度估计，并用等高线显示结果。这可以用于处理叠加。这是一个2d版本的geom_density。

函数格式：

geom_density_2d(mapping = NULL, data = NULL, stat = "density2d",  position = "identity", ..., lineend = "butt", linejoin = "round",  linemitre = 1, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)stat_density_2d(mapping = NULL, data = NULL, geom = "density_2d",  position = "identity", ..., contour = TRUE, n = 100, h = NULL,  na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)

基础的二维密度估计的等高线。

m <- ggplot(faithful, aes(x = eruptions, y = waiting)) +  geom_point() +  xlim(0.5, 6) +  ylim(40, 110)m + geom_density_2d()

这里写图片描述

添加geom参数。

m + stat_density_2d(aes(fill = ..level..), geom = "polygon")

这里写图片描述

添加参数colour=某一变量，可以按照该变量的类型用不同颜色画出一系列的等高线。

set.seed(4393)dsmall <- diamonds[sample(nrow(diamonds), 1000), ]d <- ggplot(dsmall, aes(x, y))d + geom_density_2d(aes(colour = cut))

这里写图片描述

将contour 设置为 FALSE，geom设置为其他参数。

d + stat_density_2d(geom = "raster", aes(fill = ..density..), contour = FALSE)d + stat_density_2d(geom = "point", aes(size = ..density..), n = 20, contour = FALSE)

这里写图片描述

4. geom_density 、stat_density—— 平滑密度估计

计算和绘制核密度估计，这是直方图的平滑版本。如果连续数据来自基本平滑分布，那么这将是直方图的有用替代方法。

函数格式：

geom_density(mapping = NULL, data = NULL, stat = "density",  position = "identity", ..., na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)stat_density(mapping = NULL, data = NULL, geom = "area",  position = "stack", ..., bw = "nrd0", adjust = 1, kernel = "gaussian",  n = 512, trim = FALSE, na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)

plot1 <- ggplot(diamonds, aes(carat)) +  geom_density()plot2 <- ggplot(diamonds, aes(carat)) +  geom_density(adjust = 1/5)plot3 <- ggplot(diamonds, aes(carat)) +  geom_density(adjust = 5)plot4 <- ggplot(diamonds, aes(depth, colour = cut)) +  geom_density() +  xlim(55, 70)#> Warning: Removed 45 rows containing non-finite values (stat_density).plot5 <- ggplot(diamonds, aes(depth, fill = cut, colour = cut)) +  geom_density(alpha = 0.1) +  xlim(55, 70)#> Warning: Removed 45 rows containing non-finite values (stat_density).plot6 <- ggplot(diamonds, aes(carat, fill = cut)) +  geom_density(position = "stack")plot7 <- ggplot(diamonds, aes(carat, ..count.., fill = cut)) +  geom_density(position = "stack")plot8 <- ggplot(diamonds, aes(carat, ..count.., fill = cut)) +  geom_density(position = "fill")ggplot2.multiplot(plot1,plot2,plot3,plot4,plot5,plot6,plot7,plot8,cols = 4)

这里写图片描述

5. geom_dotplot—— 点图

在点图中，点的宽度对应于箱宽度（或最大宽度，取决于装箱算法），并且点堆叠，每个点表示一个观察值。

函数格式：

geom_dotplot(mapping = NULL, data = NULL, position = "identity", ...,  binwidth = NULL, binaxis = "x", method = "dotdensity",  binpositions = "bygroup", stackdir = "up", stackratio = 1,  dotsize = 1, stackgroups = FALSE, origin = NULL, right = TRUE,  width = 0.9, drop = FALSE, na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)

plot1 <- ggplot(mtcars, aes(x = mpg)) + geom_dotplot()#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.plot2 <- ggplot(mtcars, aes(x = mpg)) + geom_dotplot(binwidth = 1.5)# Use fixed-width binsplot3 <- ggplot(mtcars, aes(x = mpg)) +  geom_dotplot(method="histodot", binwidth = 1.5)# Some other stacking methodsplot4 <- ggplot(mtcars, aes(x = mpg)) +  geom_dotplot(binwidth = 1.5, stackdir = "center")plot5 <- ggplot(mtcars, aes(x = mpg)) +  geom_dotplot(binwidth = 1.5, stackdir = "centerwhole")# y axis isn't really meaningful, so hide itplot6 <- ggplot(mtcars, aes(x = mpg)) + geom_dotplot(binwidth = 1.5) +  scale_y_continuous(NULL, breaks = NULL)# Overlap dots verticallyplot7 <- ggplot(mtcars, aes(x = mpg)) + geom_dotplot(binwidth = 1.5, stackratio = .7)# Expand dot diameterplot8 <- ggplot(mtcars, aes(x = mpg)) + geom_dotplot(binwidth = 1.5, dotsize = 1.25)# Examples with stacking along y axis instead of xplot9 <- ggplot(mtcars, aes(x = 1, y = mpg)) +  geom_dotplot(binaxis = "y", stackdir = "center")#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.ggplot2.multiplot(plot1,plot2,plot3,plot4,plot5,plot6,plot7,plot8,plot9, cols=3)plot1 <- ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +  geom_dotplot(binaxis = "y", stackdir = "center")#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.plot2 <- ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +  geom_dotplot(binaxis = "y", stackdir = "centerwhole")#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.plot3 <- ggplot(mtcars, aes(x = factor(vs), fill = factor(cyl), y = mpg)) +  geom_dotplot(binaxis = "y", stackdir = "center", position = "dodge")#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.# binpositions="all" ensures that the bins are aligned between groupsplot4 <- ggplot(mtcars, aes(x = factor(am), y = mpg)) +  geom_dotplot(binaxis = "y", stackdir = "center", binpositions="all")#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.# Stacking multiple groups, with different fillplot5 <- ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +  geom_dotplot(stackgroups = TRUE, binwidth = 1, binpositions = "all")plot6 <- ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +  geom_dotplot(stackgroups = TRUE, binwidth = 1, method = "histodot")plot7 <- ggplot(mtcars, aes(x = 1, y = mpg, fill = factor(cyl))) +  geom_dotplot(binaxis = "y", stackgroups = TRUE, binwidth = 1, method = "histodot")ggplot2.multiplot(plot1,plot2,plot3,plot4,plot5,plot6,plot7, cols=4)

这里写图片描述

6. geom_errorbarh—— 水平误差条

函数格式

geom_errorbarh(mapping = NULL, data = NULL, stat = "identity",  position = "identity", ..., na.rm = FALSE, show.legend = NA,  inherit.aes = TRUE)

df <- data.frame(  trt = factor(c(1, 1, 2, 2)),  resp = c(1, 5, 3, 4),  group = factor(c(1, 2, 1, 2)),  se = c(0.1, 0.3, 0.3, 0.2))# Define the top and bottom of the errorbarsp <- ggplot(df, aes(resp, trt, colour = group))

p + geom_point() +  geom_errorbarh(aes(xmax = resp + se, xmin = resp - se))

这里写图片描述

p + geom_point() +  geom_errorbarh(aes(xmax = resp + se, xmin = resp - se, height = .2))

这里写图片描述

本次讲了等高线、计数重叠点、2d二维密度估计的等高线、平滑密度估计、点图、水平误差条，下次继续说layer中的geoms_。

阅读全文

2 0