Cargando…

ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages

MOTIVATION: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still chall...

Descripción completa

Detalles Bibliográficos
Autores principales: Jin, Ting, Nguyen, Nam D, Talos, Flaminia, Wang, Daifeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150141/
https://www.ncbi.nlm.nih.gov/pubmed/33305308
http://dx.doi.org/10.1093/bioinformatics/btaa935
_version_ 1783698097799430144
author Jin, Ting
Nguyen, Nam D
Talos, Flaminia
Wang, Daifeng
author_facet Jin, Ting
Nguyen, Nam D
Talos, Flaminia
Wang, Daifeng
author_sort Jin, Ting
collection PubMed
description MOTIVATION: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. RESULTS: To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. AVAILABILITYAND IMPLEMENTATION: ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8150141
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81501412021-05-28 ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages Jin, Ting Nguyen, Nam D Talos, Flaminia Wang, Daifeng Bioinformatics Original Papers MOTIVATION: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a ‘black box’, barely providing biological and clinical interpretability from the box. RESULTS: To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. AVAILABILITYAND IMPLEMENTATION: ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-10 /pmc/articles/PMC8150141/ /pubmed/33305308 http://dx.doi.org/10.1093/bioinformatics/btaa935 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Jin, Ting
Nguyen, Nam D
Talos, Flaminia
Wang, Daifeng
ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
title ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
title_full ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
title_fullStr ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
title_full_unstemmed ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
title_short ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
title_sort ecmarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150141/
https://www.ncbi.nlm.nih.gov/pubmed/33305308
http://dx.doi.org/10.1093/bioinformatics/btaa935
work_keys_str_mv AT jinting ecmarkerinterpretablemachinelearningmodelidentifiesgeneexpressionbiomarkerspredictingclinicaloutcomesandrevealsmolecularmechanismsofhumandiseaseinearlystages
AT nguyennamd ecmarkerinterpretablemachinelearningmodelidentifiesgeneexpressionbiomarkerspredictingclinicaloutcomesandrevealsmolecularmechanismsofhumandiseaseinearlystages
AT talosflaminia ecmarkerinterpretablemachinelearningmodelidentifiesgeneexpressionbiomarkerspredictingclinicaloutcomesandrevealsmolecularmechanismsofhumandiseaseinearlystages
AT wangdaifeng ecmarkerinterpretablemachinelearningmodelidentifiesgeneexpressionbiomarkerspredictingclinicaloutcomesandrevealsmolecularmechanismsofhumandiseaseinearlystages