TCGA-Assembler工具下载数据

来源:互联网 发布:电商数据分析淘宝实战 编辑:程序博客网 时间:2024/05/23 14:41
美国政府发起的癌症和肿瘤基因图谱(Cancer Genome Atlas,TCGA)计划,试图通过应用基因组分析技术,特别是采用大规模的基因组测序,将人类全部癌症(近期目标为50种包括亚型在内的肿瘤)的基因组变异图谱绘制出来,并进行系统分析,旨在找到所有致癌和抑癌基因的微小变异,了解癌细胞发生、发展的机制,在此基础上取得新的诊断和治疗方法,最后可以勾画出整个新型“预防癌症的策略”。
TCGA 使命:提高人们对癌症发病分子基础的科学认识及提高我们诊断、治疗和预防癌症的能力

TCGA 目标:完成一套完整的与所有癌症基因组改变相关的“图谱”。




TCGA数据源大部分都是公开的,如何有效的进行收集和预处理是一个头疼的问题,如何将TCGA的数据转化成癌症类型的二维数据矩阵(例如基因为rows,样本为columns),得到这个矩阵之后,后面的事情就好办了,可以做差异表达,共表达网络,生存分析等。

使用TCGA-Assembler这软件去下载TCGA的数据http://www.compgenome.org/TCGA-Assembler/。TCGA-Assembler不但可以很方便的下载数据,还能对数据进行初始化处理。


#下载完后,我们使用首先要安装一些依赖包。通过下面的命令:


install.packages(c("HGNChelper", "RCurl", "httr", "stringr", "digest", "bitops"), dependencies=T)%选择镜像,批量下载非常方便

#安装完了依赖包,我们进入刚才下载的TCGA-Assembler的目录,使用

setwd(C:/Users/cathy/Desktop/TCGA-Assembler)%设置TCGA-Assembler的目录为工作目录,接下来,我们就可以下载数据了,不需要加载。




#Load all the functions in Modules A and B into the working space

source("Module_A.r");

source("Module_B.r");


#Retrieve RNA-Seq gene expression data and DNA copy number data of several READ samples (withspecified TCGA barcodes) from TCGA DCC #website. Use the following command to download RNAseqgene expression data of four patient samples and then look at the top 5 rows.

RNASeqRawData = DownloadRNASeqData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ",assayPlatform = "RNASeqV1", dataType = "gene.quantification", inputPatientIDs = c("TCGA-AG-A036-01", "TCGA-AG-3605-01", "TCGA-AG-A032-01", "TCGA-AG-A00Y-01"), outputFileName ="2minuteExample");

print(RNASeqRawData[[1]][1:5, ]);


#Use the following command to download copy number data of four patient samples and then look at thetop 15 rows of the data.

CNARawData = DownloadCNAData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda",saveFolderName = "./QuickStartGuide_Results/RawData/", cancerType = "READ", assayPlatform ="genome_wide_snp_6", inputPatientIDs = c("TCGA-DC-6156-10", "TCGA-AG-3605-01", "TCGA-AGA032-01","TCGA-AG-A00Y-01"), outputFileName = "2minuteExample");

print(CNARawData[[1]][1:15, ]);


#Processes the downloaded data to perform basic quality control and output clean data matrix files, whereeach row is a genomic feature and each #column corresponds to a sample. Use the following command toprocess RNA-seq gene expression data and then check a few rows of the #processed data. RPKM (readsper kilo base per million) values of gene expressions are extracted for subsequent analysis.

GeneExpData = ProcessRNASeqData(inputFilePath ="./QuickStartGuide_Results/RawData/2minuteExample__READ__unc.edu__illuminaga_rnaseq__gene.quantification__Jul-08-2014.txt", outputFileName = "READ__illuminahiseq_rnaseqv2__GeneExp",outputFileFolder = "./QuickStartGuide_Results/BasicProcessingResult", dataType = "GeneExp", verType= "RNASeqV1");

print(cbind(GeneExpData$Des, GeneExpData$Data)[31:35, ]);


#Use the following command to process downloaded copy number data and calculate copy numbers ofgenes. Then, check a few rows of the #data.

GeneLevel.CNA = ProcessCNAData(inputFilePath="./QuickStartGuide_Results/RawData/2minuteExample__READ__broad.mit.edu__genome_wide_snp_6__hg18__Jul-08-2014.txt", outputFileName = "READ__genome_wide_snp_6__GeneLevelCNA",outputFileFolder = "./QuickStartGuide_Results/BasicProcessingResult", refGenomeFile ="./SupportingFiles/Hg18GenePosition.txt");

print(cbind(GeneLevel.CNA$Des, GeneLevel.CNA$Data)[5:20, ]);


#Integrate RNA-Seq and copy number data into a mega data matrix. First, form a list object containingthe two processed datasets, which will be input into the data integration function.

dataList = vector("list", 2);dataList[[1]] = list(Data = GeneExpData$Data, Des = GeneExpData$Des , dataType = "GeneExp");

dataList[[2]] = list(Data = GeneLevel.CNA$Data, Des = GeneLevel.CNA$Des, dataType = "CNA");

#Second, use the following commands to integrate the data and then check the top 15 rows of data.

MergedData = CombineMultiPlatformData(inputDataList = dataList);print(cbind(MergedData$Des, MergedData$Data)[1:15, ]);


http://www.compgenome.org/TCGA-Assembler/documents/TCGA-Assembler%20Quick%20Start%20Guide.pdf





1 0
原创粉丝点击