Cargando…

The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data

BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study inves...

Descripción completa

Detalles Bibliográficos
Autores principales: de Torrenté, Laurence, Zimmerman, Samuel, Suzuki, Masako, Christopeit, Maximilian, Greally, John M., Mar, Jessica C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768656/
https://www.ncbi.nlm.nih.gov/pubmed/33371881
http://dx.doi.org/10.1186/s12859-020-03892-w
_version_ 1783629202213306368
author de Torrenté, Laurence
Zimmerman, Samuel
Suzuki, Masako
Christopeit, Maximilian
Greally, John M.
Mar, Jessica C.
author_facet de Torrenté, Laurence
Zimmerman, Samuel
Suzuki, Masako
Christopeit, Maximilian
Greally, John M.
Mar, Jessica C.
author_sort de Torrenté, Laurence
collection PubMed
description BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). RESULTS: Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. CONCLUSIONS: Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort.
format Online
Article
Text
id pubmed-7768656
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77686562020-12-29 The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data de Torrenté, Laurence Zimmerman, Samuel Suzuki, Masako Christopeit, Maximilian Greally, John M. Mar, Jessica C. BMC Bioinformatics Research BACKGROUND: In genomics, we often assume that continuous data, such as gene expression, follow a specific kind of distribution. However we rarely stop to question the validity of this assumption, or consider how broadly applicable it may be to all genes that are in the transcriptome. Our study investigated the prevalence of a range of gene expression distributions in three different tumor types from the Cancer Genome Atlas (TCGA). RESULTS: Surprisingly, the expression of less than 50% of all genes was Normally-distributed, with other distributions including Gamma, Bimodal, Cauchy, and Lognormal also represented. Most of the distribution categories contained genes that were significantly enriched for unique biological processes. Different assumptions based on the shape of the expression profile were used to identify genes that could discriminate between patients with good versus poor survival. The prognostic marker genes that were identified when the shape of the distribution was accounted for reflected functional insights into cancer biology that were not observed when standard assumptions were applied. We showed that when multiple types of distributions were permitted, i.e. the shape of the expression profile was used, the statistical classifiers had greater predictive accuracy for determining the prognosis of a patient versus those that assumed only one type of gene expression distribution. CONCLUSIONS: Our results highlight the value of studying a gene’s distribution shape to model heterogeneity of transcriptomic data and the impact on using analyses that permit more than one type of gene expression distribution. These insights would have been overlooked when using standard approaches that assume all genes follow the same type of distribution in a patient cohort. BioMed Central 2020-12-28 /pmc/articles/PMC7768656/ /pubmed/33371881 http://dx.doi.org/10.1186/s12859-020-03892-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
de Torrenté, Laurence
Zimmerman, Samuel
Suzuki, Masako
Christopeit, Maximilian
Greally, John M.
Mar, Jessica C.
The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
title The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
title_full The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
title_fullStr The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
title_full_unstemmed The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
title_short The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
title_sort shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768656/
https://www.ncbi.nlm.nih.gov/pubmed/33371881
http://dx.doi.org/10.1186/s12859-020-03892-w
work_keys_str_mv AT detorrentelaurence theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT zimmermansamuel theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT suzukimasako theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT christopeitmaximilian theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT greallyjohnm theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT marjessicac theshapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT detorrentelaurence shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT zimmermansamuel shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT suzukimasako shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT christopeitmaximilian shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT greallyjohnm shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata
AT marjessicac shapeofgeneexpressiondistributionsmatterhowincorporatingdistributionshapeimprovestheinterpretationofcancertranscriptomicdata