Cargando…

GMMchi: gene expression clustering using Gaussian mixture modeling

BACKGROUND: Cancer evolution consists of a stepwise acquisition of genetic and epigenetic changes, which alter the gene expression profiles of cells in a particular tissue and result in phenotypic alterations acted upon by natural selection. The recurrent appearance of specific genetic lesions acros...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Ta-Chun, Kalugin, Peter N., Wilding, Jennifer L., Bodmer, Walter F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9632092/
https://www.ncbi.nlm.nih.gov/pubmed/36324085
http://dx.doi.org/10.1186/s12859-022-05006-0
_version_ 1784823956835926016
author Liu, Ta-Chun
Kalugin, Peter N.
Wilding, Jennifer L.
Bodmer, Walter F.
author_facet Liu, Ta-Chun
Kalugin, Peter N.
Wilding, Jennifer L.
Bodmer, Walter F.
author_sort Liu, Ta-Chun
collection PubMed
description BACKGROUND: Cancer evolution consists of a stepwise acquisition of genetic and epigenetic changes, which alter the gene expression profiles of cells in a particular tissue and result in phenotypic alterations acted upon by natural selection. The recurrent appearance of specific genetic lesions across individual cancers and cancer types suggests the existence of certain “driver mutations,” which likely make up the major contribution to tumors’ selective advantages over surrounding normal tissue and as such are responsible for the most consequential aspects of the cancer cells’ gene expression patterns and phenotypes. We hypothesize that such mutations are likely to cluster with specific dichotomous shifts in the expression of the genes they most closely control, and propose GMMchi, a Python package that leverages Gaussian Mixture Modeling to detect and characterize bimodal gene expression patterns across cancer samples, as a tool to analyze such correlations using 2 × 2 contingency table statistics. RESULTS: Using well-defined simulated data, we were able to confirm the robust performance of GMMchi, reaching 85% accuracy with a sample size of n = 90. We were also able to demonstrate a few examples of the application of GMMchi with respect to its capacity to characterize background florescent signals in microarray data, filter out uninformative background probe sets, as well as uncover novel genetic interrelationships and tumor characteristics. Our approach to analysing gene expression analysis in cancers provides an additional lens to supplement traditional continuous-valued statistical analysis by maximizing the information that can be gathered from bulk gene expression data. CONCLUSIONS: We confirm that GMMchi robustly and reliably extracts bimodal patterns from both colorectal cancer (CRC) cell line-derived microarray and tumor-derived RNA-Seq data and verify previously reported gene expression correlates of some well-characterized CRC phenotypes. AVAILABILITY: The Python package GMMchi and our cell line microarray data used in this paper is available for downloading on GitHub at https://github.com/jeffliu6068/GMMchi.
format Online
Article
Text
id pubmed-9632092
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96320922022-11-04 GMMchi: gene expression clustering using Gaussian mixture modeling Liu, Ta-Chun Kalugin, Peter N. Wilding, Jennifer L. Bodmer, Walter F. BMC Bioinformatics Research BACKGROUND: Cancer evolution consists of a stepwise acquisition of genetic and epigenetic changes, which alter the gene expression profiles of cells in a particular tissue and result in phenotypic alterations acted upon by natural selection. The recurrent appearance of specific genetic lesions across individual cancers and cancer types suggests the existence of certain “driver mutations,” which likely make up the major contribution to tumors’ selective advantages over surrounding normal tissue and as such are responsible for the most consequential aspects of the cancer cells’ gene expression patterns and phenotypes. We hypothesize that such mutations are likely to cluster with specific dichotomous shifts in the expression of the genes they most closely control, and propose GMMchi, a Python package that leverages Gaussian Mixture Modeling to detect and characterize bimodal gene expression patterns across cancer samples, as a tool to analyze such correlations using 2 × 2 contingency table statistics. RESULTS: Using well-defined simulated data, we were able to confirm the robust performance of GMMchi, reaching 85% accuracy with a sample size of n = 90. We were also able to demonstrate a few examples of the application of GMMchi with respect to its capacity to characterize background florescent signals in microarray data, filter out uninformative background probe sets, as well as uncover novel genetic interrelationships and tumor characteristics. Our approach to analysing gene expression analysis in cancers provides an additional lens to supplement traditional continuous-valued statistical analysis by maximizing the information that can be gathered from bulk gene expression data. CONCLUSIONS: We confirm that GMMchi robustly and reliably extracts bimodal patterns from both colorectal cancer (CRC) cell line-derived microarray and tumor-derived RNA-Seq data and verify previously reported gene expression correlates of some well-characterized CRC phenotypes. AVAILABILITY: The Python package GMMchi and our cell line microarray data used in this paper is available for downloading on GitHub at https://github.com/jeffliu6068/GMMchi. BioMed Central 2022-11-02 /pmc/articles/PMC9632092/ /pubmed/36324085 http://dx.doi.org/10.1186/s12859-022-05006-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Ta-Chun
Kalugin, Peter N.
Wilding, Jennifer L.
Bodmer, Walter F.
GMMchi: gene expression clustering using Gaussian mixture modeling
title GMMchi: gene expression clustering using Gaussian mixture modeling
title_full GMMchi: gene expression clustering using Gaussian mixture modeling
title_fullStr GMMchi: gene expression clustering using Gaussian mixture modeling
title_full_unstemmed GMMchi: gene expression clustering using Gaussian mixture modeling
title_short GMMchi: gene expression clustering using Gaussian mixture modeling
title_sort gmmchi: gene expression clustering using gaussian mixture modeling
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9632092/
https://www.ncbi.nlm.nih.gov/pubmed/36324085
http://dx.doi.org/10.1186/s12859-022-05006-0
work_keys_str_mv AT liutachun gmmchigeneexpressionclusteringusinggaussianmixturemodeling
AT kaluginpetern gmmchigeneexpressionclusteringusinggaussianmixturemodeling
AT wildingjenniferl gmmchigeneexpressionclusteringusinggaussianmixturemodeling
AT bodmerwalterf gmmchigeneexpressionclusteringusinggaussianmixturemodeling