Cargando…
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study inves...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768656/ https://www.ncbi.nlm.nih.gov/pubmed/33371881 http://dx.doi.org/10.1186/s12859-020-03892-w |
_version_ | 1783629202213306368 |
---|---|
author | de Torrenté, Laurence Zimmerman, Samuel Suzuki, Masako Christopeit, Maximilian Greally, John M. Mar, Jessica C. |
author_facet | de Torrenté, Laurence Zimmerman, Samuel Suzuki, Masako Christopeit, Maximilian Greally, John M. Mar, Jessica C. |
author_sort | de Torrenté, Laurence |
collection | PubMed |
description | BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). RESULTS: Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. CONCLUSIONS: Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort. |
format | Online Article Text |
id | pubmed-7768656 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77686562020-12-29 The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data de Torrenté, Laurence Zimmerman, Samuel Suzuki, Masako Christopeit, Maximilian Greally, John M. Mar, Jessica C. BMC Bioinformatics Research BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). RESULTS: Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. CONCLUSIONS: Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort. BioMed Central 2020-12-28 /pmc/articles/PMC7768656/ /pubmed/33371881 http://dx.doi.org/10.1186/s12859-020-03892-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research de Torrenté, Laurence Zimmerman, Samuel Suzuki, Masako Christopeit, Maximilian Greally, John M. Mar, Jessica C. The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title | The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_full | The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_fullStr | The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_full_unstemmed | The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_short | The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
title_sort | shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768656/ https://www.ncbi.nlm.nih.gov/pubmed/33371881 http://dx.doi.org/10.1186/s12859-020-03892-w |
work_keys_str_mv | AT detorrentelaurence theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT zimmermansamuel theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT suzukimasako theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT christopeitmaximilian theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT greallyjohnm theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT marjessicac theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT detorrentelaurence shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT zimmermansamuel shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT suzukimasako shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT christopeitmaximilian shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT greallyjohnm shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata AT marjessicac shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata |