TCGA-Assembler工具下载数据
来源:互联网 发布:电商数据分析淘宝实战 编辑:程序博客网 时间:2024/05/23 14:41
TCGA 使命:提高人们对癌症发病分子基础的科学认识及提高我们诊断、治疗和预防癌症的能力
TCGA 目标:完成一套完整的与所有癌症基因组改变相关的“图谱”。
TCGA数据源大部分都是公开的,如何有效的进行收集和预处理是一个头疼的问题,如何将TCGA的数据转化成癌症类型的二维数据矩阵(例如基因为rows,样本为columns),得到这个矩阵之后,后面的事情就好办了,可以做差异表达,共表达网络,生存分析等。
使用TCGA-Assembler这软件去下载TCGA的数据http://www.compgenome.org/TCGA-Assembler/。TCGA-Assembler不但可以很方便的下载数据,还能对数据进行初始化处理。
#下载完后,我们使用首先要安装一些依赖包。通过下面的命令:
install.packages(c("HGNChelper", "RCurl", "httr", "stringr", "digest", "bitops"), dependencies=T)%选择镜像,批量下载非常方便
#安装完了依赖包,我们进入刚才下载的TCGA-Assembler的目录,使用
setwd(C:/Users/cathy/Desktop/TCGA-Assembler)%设置TCGA-Assembler的目录为工作目录,接下来,我们就可以下载数据了,不需要加载。
#Load all the functions in Modules A and B into the working space
source("Module_A.r");
source("Module_B.r");
#Retrieve RNA-Seq gene expression data and DNA copy number data of several READ samples (withspecified TCGA barcodes) from TCGA DCC #website. Use the following command to download RNAseqgene expression data of four patient samples and then look at the top 5 rows.
RNASeqRawData = DownloadRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",assayPlatform = "RNASeqV1", dataType = "gene.quantification", inputPatientIDs = c("TCGA-AG-A036-01", "TCGA-AG-3605-01", "TCGA-AG-A032-01", "TCGA-AG-A00Y-01"), outputFileName ="2minuteExample");
print(RNASeqRawData[[1]][1:5, ]);
#Use the following command to download copy number data of four patient samples and then look at thetop 15 rows of the data.
CNARawData = DownloadCNAData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ", assayPlatform ="genome_wide_snp_6", inputPatientIDs = c("TCGA-DC-6156-10", "TCGA-AG-3605-01", "TCGA-AGA032-01","TCGA-AG-A00Y-01"), outputFileName = "2minuteExample");
print(CNARawData[[1]][1:15, ]);
#Processes the downloaded data to perform basic quality control and output clean data matrix files, whereeach row is a genomic feature and each #column corresponds to a sample. Use the following command toprocess RNA-seq gene expression data and then check a few rows of the #processed data. RPKM (readsper kilo base per million) values of gene expressions are extracted for subsequent analysis.
GeneExpData = ProcessRNASeqData(inputFilePath ="./QuickStartGuide_Results/RawData/2minuteExample__READ__unc.edu__illuminaga_rnaseq__gene.quantification__Jul-08-2014.txt", outputFileName = "READ__illuminahiseq_rnaseqv2__GeneExp",outputFileFolder = "./QuickStartGuide_Results/BasicProcessingResult", dataType = "GeneExp", verType= "RNASeqV1");
print(cbind(GeneExpData$Des, GeneExpData$Data)[31:35, ]);
#Use the following command to process downloaded copy number data and calculate copy numbers ofgenes. Then, check a few rows of the #data.
GeneLevel.CNA = ProcessCNAData(inputFilePath="./QuickStartGuide_Results/RawData/2minuteExample__READ__broad.mit.edu__genome_wide_snp_6__hg18__Jul-08-2014.txt", outputFileName = "READ__genome_wide_snp_6__GeneLevelCNA",outputFileFolder = "./QuickStartGuide_Results/BasicProcessingResult", refGenomeFile ="./SupportingFiles/Hg18GenePosition.txt");
print(cbind(GeneLevel.CNA$Des, GeneLevel.CNA$Data)[5:20, ]);
#Integrate RNA-Seq and copy number data into a mega data matrix. First, form a list object containingthe two processed datasets, which will be input into the data integration function.
dataList = vector("list", 2);dataList[[1]] = list(Data = GeneExpData$Data, Des = GeneExpData$Des , dataType = "GeneExp");
dataList[[2]] = list(Data = GeneLevel.CNA$Data, Des = GeneLevel.CNA$Des, dataType = "CNA");
#Second, use the following commands to integrate the data and then check the top 15 rows of data.
MergedData = CombineMultiPlatformData(inputDataList = dataList);print(cbind(MergedData$Des, MergedData$Data)[1:15, ]);
http://www.compgenome.org/TCGA-Assembler/documents/TCGA-Assembler%20Quick%20Start%20Guide.pdf
- TCGA-Assembler工具下载数据
- TCGA数据下载方法简介
- TCGA数据下载:R包TCGA2STAT介绍
- TCGA数据下载:R包TCGAbiolinks介绍
- TCGA数据下载:R包RTCGA介绍
- TCGA数据下载:R包RTCGAToolbox介绍
- TCGA数据文件下载
- R语言:TCGA数据分析一
- Microsoft Macro Assembler 8.0 (MASM) 下载
- HttpURLConnection:网络下载数据的工具类
- TCGA rsem 计算
- ARM Assembler VS GNU Assembler
- ENVI下下载Esri Map Service在线数据工具
- MongoDB数据同步工具mongosync介绍及下载
- GIS相关数据下载网站、软件和工具等
- 大型机汇编(mainframe assembler/HLASM)之显示内存实际数据
- 工具下载
- 工具下载
- html/xml转excle
- Noip2015 普及组 推销员 题解
- Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
- java中的IO操作总结(一)
- 颜色意义
- TCGA-Assembler工具下载数据
- 网络获取加载图片的三种方法
- 如何在移动端调试网页?
- 阿里云部署Django项目(超详细图文教程) —— Part2. 使用Nginx, Gunicorn, virtualenv, supervisor 部署Django应用
- C++迭代器
- java 线程
- java导出pdf
- Set 接口 源代码
- Cygwin各种情况下中文乱码终极解决方案