Cargando…
Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression
Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913919/ https://www.ncbi.nlm.nih.gov/pubmed/27322383 http://dx.doi.org/10.1371/journal.pone.0157484 |
_version_ | 1782438476515377152 |
---|---|
author | Torrente, Aurora Lukk, Margus Xue, Vincent Parkinson, Helen Rung, Johan Brazma, Alvis |
author_facet | Torrente, Aurora Lukk, Margus Xue, Vincent Parkinson, Helen Rung, Johan Brazma, Alvis |
author_sort | Torrente, Aurora |
collection | PubMed |
description | Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain. |
format | Online Article Text |
id | pubmed-4913919 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-49139192016-07-06 Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression Torrente, Aurora Lukk, Margus Xue, Vincent Parkinson, Helen Rung, Johan Brazma, Alvis PLoS One Research Article Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain. Public Library of Science 2016-06-20 /pmc/articles/PMC4913919/ /pubmed/27322383 http://dx.doi.org/10.1371/journal.pone.0157484 Text en © 2016 Torrente et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Torrente, Aurora Lukk, Margus Xue, Vincent Parkinson, Helen Rung, Johan Brazma, Alvis Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression |
title | Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression |
title_full | Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression |
title_fullStr | Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression |
title_full_unstemmed | Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression |
title_short | Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression |
title_sort | identification of cancer related genes using a comprehensive map of human gene expression |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913919/ https://www.ncbi.nlm.nih.gov/pubmed/27322383 http://dx.doi.org/10.1371/journal.pone.0157484 |
work_keys_str_mv | AT torrenteaurora identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression AT lukkmargus identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression AT xuevincent identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression AT parkinsonhelen identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression AT rungjohan identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression AT brazmaalvis identificationofcancerrelatedgenesusingacomprehensivemapofhumangeneexpression |