Cargando…

Permutation-based identification of important biomarkers for complex diseases via machine learning models

Study of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Many machine learning-based methods have been developed and widely used to alleviate some analytic challenges in complex human disease studies....

Descripción completa

Detalles Bibliográficos
Autores principales: Mi, Xinlei, Zou, Baiming, Zou, Fei, Hu, Jianhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8140109/
https://www.ncbi.nlm.nih.gov/pubmed/34021151
http://dx.doi.org/10.1038/s41467-021-22756-2
_version_ 1783696123726135296
author Mi, Xinlei
Zou, Baiming
Zou, Fei
Hu, Jianhua
author_facet Mi, Xinlei
Zou, Baiming
Zou, Fei
Hu, Jianhua
author_sort Mi, Xinlei
collection PubMed
description Study of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Many machine learning-based methods have been developed and widely used to alleviate some analytic challenges in complex human disease studies. While enjoying the modeling flexibility and robustness, these model frameworks suffer from non-transparency and difficulty in interpreting each individual feature due to their sophisticated algorithms. However, identifying important biomarkers is a critical pursuit towards assisting researchers to establish novel hypotheses regarding prevention, diagnosis and treatment of complex human diseases. Herein, we propose a Permutation-based Feature Importance Test (PermFIT) for estimating and testing the feature importance, and for assisting interpretation of individual feature in complex frameworks, including deep neural networks, random forests, and support vector machines. PermFIT (available at https://github.com/SkadiEye/deepTL) is implemented in a computationally efficient manner, without model refitting. We conduct extensive numerical studies under various scenarios, and show that PermFIT not only yields valid statistical inference, but also improves the prediction accuracy of machine learning models. With the application to the Cancer Genome Atlas kidney tumor data and the HITChip atlas data, PermFIT demonstrates its practical usage in identifying important biomarkers and boosting model prediction performance.
format Online
Article
Text
id pubmed-8140109
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-81401092021-06-03 Permutation-based identification of important biomarkers for complex diseases via machine learning models Mi, Xinlei Zou, Baiming Zou, Fei Hu, Jianhua Nat Commun Article Study of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Many machine learning-based methods have been developed and widely used to alleviate some analytic challenges in complex human disease studies. While enjoying the modeling flexibility and robustness, these model frameworks suffer from non-transparency and difficulty in interpreting each individual feature due to their sophisticated algorithms. However, identifying important biomarkers is a critical pursuit towards assisting researchers to establish novel hypotheses regarding prevention, diagnosis and treatment of complex human diseases. Herein, we propose a Permutation-based Feature Importance Test (PermFIT) for estimating and testing the feature importance, and for assisting interpretation of individual feature in complex frameworks, including deep neural networks, random forests, and support vector machines. PermFIT (available at https://github.com/SkadiEye/deepTL) is implemented in a computationally efficient manner, without model refitting. We conduct extensive numerical studies under various scenarios, and show that PermFIT not only yields valid statistical inference, but also improves the prediction accuracy of machine learning models. With the application to the Cancer Genome Atlas kidney tumor data and the HITChip atlas data, PermFIT demonstrates its practical usage in identifying important biomarkers and boosting model prediction performance. Nature Publishing Group UK 2021-05-21 /pmc/articles/PMC8140109/ /pubmed/34021151 http://dx.doi.org/10.1038/s41467-021-22756-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Mi, Xinlei
Zou, Baiming
Zou, Fei
Hu, Jianhua
Permutation-based identification of important biomarkers for complex diseases via machine learning models
title Permutation-based identification of important biomarkers for complex diseases via machine learning models
title_full Permutation-based identification of important biomarkers for complex diseases via machine learning models
title_fullStr Permutation-based identification of important biomarkers for complex diseases via machine learning models
title_full_unstemmed Permutation-based identification of important biomarkers for complex diseases via machine learning models
title_short Permutation-based identification of important biomarkers for complex diseases via machine learning models
title_sort permutation-based identification of important biomarkers for complex diseases via machine learning models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8140109/
https://www.ncbi.nlm.nih.gov/pubmed/34021151
http://dx.doi.org/10.1038/s41467-021-22756-2
work_keys_str_mv AT mixinlei permutationbasedidentificationofimportantbiomarkersforcomplexdiseasesviamachinelearningmodels
AT zoubaiming permutationbasedidentificationofimportantbiomarkersforcomplexdiseasesviamachinelearningmodels
AT zoufei permutationbasedidentificationofimportantbiomarkersforcomplexdiseasesviamachinelearningmodels
AT hujianhua permutationbasedidentificationofimportantbiomarkersforcomplexdiseasesviamachinelearningmodels