R画corplot

来源:互联网 发布:数据恢复精灵激活码 编辑:程序博客网 时间:2024/06/01 08:11

相关性系数是判别两个样品之间相似度的一种度量方式,如果有多个样品,你打算看看他们之间的相似性,比较好的一种办法就是画corplot

这里给出来一个R语言的ggplot版本的方法,

install.packages("corrplot")

library(corrplot)

rna<-read.table("merge.xls",header=TRUE)


data<-subset(rna,sum1>0)

rnacor<-cor(data[2:7])

#这里用了col1的概念,其实是corplot官方说明文档上的一段代码,挺好用,大家也可以直接拷过去

col1 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","white", "cyan", "#007FFF", "blue","#00007F"))col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582", "#FDDBC7","#FFFFFF", "#D1E5F0", "#92C5DE", "#4393C3", "#2166AC", "#053061"))col3 <- colorRampPalette(c("red", "white", "blue"))col4 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","#7FFF7F", "cyan", "#007FFF", "blue","#00007F"))wb <- c("white","black")par(ask = TRUE)


corrplot(rnacor,method="pie",shade.col="red",tl.col="red",tl.offset=0.8,tl.srt=45,order="hclust",addCoef.col="red")


corrplot(rnacor, method="square", col=col2(200),order = "hclust")

 


这里有一个挺有意思的事情,就是在算person相关性系数的时候,如果在过滤的过程中,把一行全部是0的行去掉,相关性系数会很低,如果把一行内最小值是0的都过滤掉,那么相关性系数就会很高,这个具体算法的取舍,因为皮尔逊相关系数在计算的过程中对0敏感,希望读者在具体项目中要加以注意,这里测试过一下,如果把0赋予一个比较小的值,其相关性系数还是比较低,这样来看应该是较小的数值对pearson系数构成了影响,直接把是0的过滤掉,相关性系数就很高啦。


数据:

10S1S3S6S8S9S1.729532.499952.89543.976463.040711.72953243.827270.098301.783343.476279.752243.82742.752539.231146.432.969944.584742.752500000000000.1490800.1304140.0671660.0975970.0339220.2607820.13041447.593258.189862.704886.868471.157847.593248.034559.279563.463788.671871.070148.034547.328358.371462.938387.021270.456147.3283363.076417.626444.945629.738542.223363.07616.713918.989318.021819.55919.835716.713900000018.954122.662623.726123.828623.494418.9541171.233150.81164.34174.828199.335171.2332.724817.130375.634693.856795.398622.724810000000000000000000000000000001.287841.219150.6429221.177150.6442341.28784














0 0