Cargando…

Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI

BACKGROUND: DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. RESULTS: We propose a comprehensive approach of combined DNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Kalyakulina, Alena, Yusipov, Igor, Bacalini, Maria Giulia, Franceschi, Claudio, Vedunova, Maria, Ivanchenko, Mikhail
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718659/
https://www.ncbi.nlm.nih.gov/pubmed/36259657
http://dx.doi.org/10.1093/gigascience/giac097
_version_ 1784843137147994112
author Kalyakulina, Alena
Yusipov, Igor
Bacalini, Maria Giulia
Franceschi, Claudio
Vedunova, Maria
Ivanchenko, Mikhail
author_facet Kalyakulina, Alena
Yusipov, Igor
Bacalini, Maria Giulia
Franceschi, Claudio
Vedunova, Maria
Ivanchenko, Mikhail
author_sort Kalyakulina, Alena
collection PubMed
description BACKGROUND: DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. RESULTS: We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson’s disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. CONCLUSIONS: We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson’s disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis of many different datasets, impute missing values, and build classification models of small dimensionality.
format Online
Article
Text
id pubmed-9718659
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97186592022-12-06 Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI Kalyakulina, Alena Yusipov, Igor Bacalini, Maria Giulia Franceschi, Claudio Vedunova, Maria Ivanchenko, Mikhail Gigascience Research BACKGROUND: DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. RESULTS: We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson’s disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. CONCLUSIONS: We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson’s disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis of many different datasets, impute missing values, and build classification models of small dimensionality. Oxford University Press 2022-10-19 /pmc/articles/PMC9718659/ /pubmed/36259657 http://dx.doi.org/10.1093/gigascience/giac097 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Kalyakulina, Alena
Yusipov, Igor
Bacalini, Maria Giulia
Franceschi, Claudio
Vedunova, Maria
Ivanchenko, Mikhail
Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
title Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
title_full Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
title_fullStr Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
title_full_unstemmed Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
title_short Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI
title_sort disease classification for whole-blood dna methylation: meta-analysis, missing values imputation, and xai
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718659/
https://www.ncbi.nlm.nih.gov/pubmed/36259657
http://dx.doi.org/10.1093/gigascience/giac097
work_keys_str_mv AT kalyakulinaalena diseaseclassificationforwholeblooddnamethylationmetaanalysismissingvaluesimputationandxai
AT yusipovigor diseaseclassificationforwholeblooddnamethylationmetaanalysismissingvaluesimputationandxai
AT bacalinimariagiulia diseaseclassificationforwholeblooddnamethylationmetaanalysismissingvaluesimputationandxai
AT franceschiclaudio diseaseclassificationforwholeblooddnamethylationmetaanalysismissingvaluesimputationandxai
AT vedunovamaria diseaseclassificationforwholeblooddnamethylationmetaanalysismissingvaluesimputationandxai
AT ivanchenkomikhail diseaseclassificationforwholeblooddnamethylationmetaanalysismissingvaluesimputationandxai