Clusteval logo ClustEval clustering evaluation framework
Navigation:
027Hints:

Here you see general information about the dataset.

Data Set 'tcga' - General

tcga

The Cancer Genome Atlas (TCGA) is a project maintaining a database storing molecular information of cancer cells, including gene expression, DNA methylation or copy number aberration. It includes data for many different cancer types, allowing their comparison on a molecular level. A data set has been derived integrating gene expression levels, DNA methylation and copy number aberration of the three different cancer types, namely Breast Invasive Carcinoma (BRCA, 207 samples), Glioblastoma Multiforme (GBM, 67 samples) and Lung Squamous Cell Carcinoma (LUSC, 19 samples). For each type of molecular information the authors calculated pairwise similarities between the samples using Spearman correlation. This resulted in three similarities for every pair of samples, which were then combined by taking their arithmetic mean.

Publication: Nora Speicher. Towards the identification of cancer subtypes by integrative clustering of molecular data. Masters thesis, Saarland University, December 2012. (Link)


Information Value
Aliastcga
Nametcga/all_emc_spearman.txt
Description:The Cancer Genome Atlas (TCGA) is a project maintaining a database storing molecular information of cancer cells, including gene expression, DNA methylation or copy number aberration. It includes data for many different cancer types, allowing their comparison on a molecular level. A data set has been derived integrating gene expression levels, DNA methylation and copy number aberration of the three different cancer types, namely Breast Invasive Carcinoma (BRCA, 207 samples), Glioblastoma Multiforme (GBM, 67 samples) and Lung Squamous Cell Carcinoma (LUSC, 19 samples). For each type of molecular information the authors calculated pairwise similarities between the samples using Spearman correlation. This resulted in three similarities for every pair of samples, which were then combined by taking their arithmetic mean.
Publication:Nora Speicher. Towards the identification of cancer subtypes by integrative clustering of molecular data. Masters thesis, Saarland University, December 2012.
Dataset formatSimilarity Matrix
DownloadClick (by downloading the dataset you accept the license presented below)
License
First 10 lines