Cargando…

Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients

BACKGROUND: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can em...

Descripción completa

Detalles Bibliográficos
Autores principales: Lou, Shaoke, Li, Tianxiao, Spakowicz, Daniel, Yan, Xiting, Chupp, Geoffrey Lowell, Gerstein, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560063/
https://www.ncbi.nlm.nih.gov/pubmed/33059594
http://dx.doi.org/10.1186/s12859-020-03785-y
_version_ 1783595003987099648
author Lou, Shaoke
Li, Tianxiao
Spakowicz, Daniel
Yan, Xiting
Chupp, Geoffrey Lowell
Gerstein, Mark
author_facet Lou, Shaoke
Li, Tianxiao
Spakowicz, Daniel
Yan, Xiting
Chupp, Geoffrey Lowell
Gerstein, Mark
author_sort Lou, Shaoke
collection PubMed
description BACKGROUND: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. RESULTS: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. CONCLUSION: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients.
format Online
Article
Text
id pubmed-7560063
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-75600632020-10-16 Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients Lou, Shaoke Li, Tianxiao Spakowicz, Daniel Yan, Xiting Chupp, Geoffrey Lowell Gerstein, Mark BMC Bioinformatics Research Article BACKGROUND: The pathogenesis of asthma is a complex process involving multiple genes and pathways. Identifying biomarkers from asthma datasets, especially those that include heterogeneous subpopulations, is challenging. Potentially, autoencoders provide ideal frameworks for such tasks as they can embed complex, noisy high-dimensional gene expression data into a low-dimensional latent space in an unsupervised fashion, enabling us to extract distinguishing features from expression data. RESULTS: Here, we developed a framework combining a denoising autoencoder and a supervised learning classifier to identify gene signatures related to asthma severity. Using the trained autoencoder with 50 hidden units, we found that hierarchical clustering on the low-dimensional embedding corresponds well with previously defined and clinically relevant clusters of patients. Moreover, each hidden unit has contributions from each of the genes, and pathway analysis of these contributions shows that the hidden units are significantly enriched in known asthma-related pathways. We then used genes that contribute most to the hidden units to develop a secondary random-forest classifier for directly predicting asthma severity. The feature importance metric from this classifier identified a signature based on 50 key genes, which are associated with severity. Furthermore, we can use these key genes to successfully estimate FEV1/FVC ratios across patients, via support-vector-machine regression. CONCLUSION: We found that the denoising autoencoder framework can extract meaningful patterns corresponding to functional gene groups and patient clusters from the gene expression of asthma patients. BioMed Central 2020-10-15 /pmc/articles/PMC7560063/ /pubmed/33059594 http://dx.doi.org/10.1186/s12859-020-03785-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Lou, Shaoke
Li, Tianxiao
Spakowicz, Daniel
Yan, Xiting
Chupp, Geoffrey Lowell
Gerstein, Mark
Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_full Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_fullStr Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_full_unstemmed Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_short Latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
title_sort latent-space embedding of expression data identifies gene signatures from sputum samples of asthmatic patients
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560063/
https://www.ncbi.nlm.nih.gov/pubmed/33059594
http://dx.doi.org/10.1186/s12859-020-03785-y
work_keys_str_mv AT loushaoke latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT litianxiao latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT spakowiczdaniel latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT yanxiting latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT chuppgeoffreylowell latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients
AT gersteinmark latentspaceembeddingofexpressiondataidentifiesgenesignaturesfromsputumsamplesofasthmaticpatients