Cargando…

Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)

Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generati...

Descripción completa

Detalles Bibliográficos
Autores principales: Giuili, Edoardo, Grolaux, Robin, Macedo, Catarina Z. N. M., Desmyter, Laurence, Pichon, Bruno, Neuens, Sebastian, Vilain, Catheline, Olsen, Catharina, Van Dooren, Sonia, Smits, Guillaume, Defrance, Matthieu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10676303/
https://www.ncbi.nlm.nih.gov/pubmed/37889307
http://dx.doi.org/10.1007/s00439-023-02609-2
_version_ 1785141251406823424
author Giuili, Edoardo
Grolaux, Robin
Macedo, Catarina Z. N. M.
Desmyter, Laurence
Pichon, Bruno
Neuens, Sebastian
Vilain, Catheline
Olsen, Catharina
Van Dooren, Sonia
Smits, Guillaume
Defrance, Matthieu
author_facet Giuili, Edoardo
Grolaux, Robin
Macedo, Catarina Z. N. M.
Desmyter, Laurence
Pichon, Bruno
Neuens, Sebastian
Vilain, Catheline
Olsen, Catharina
Van Dooren, Sonia
Smits, Guillaume
Defrance, Matthieu
author_sort Giuili, Edoardo
collection PubMed
description Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models’ predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00439-023-02609-2.
format Online
Article
Text
id pubmed-10676303
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-106763032023-10-27 Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs) Giuili, Edoardo Grolaux, Robin Macedo, Catarina Z. N. M. Desmyter, Laurence Pichon, Bruno Neuens, Sebastian Vilain, Catheline Olsen, Catharina Van Dooren, Sonia Smits, Guillaume Defrance, Matthieu Hum Genet Original Investigation Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models’ predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00439-023-02609-2. Springer Berlin Heidelberg 2023-10-27 2023 /pmc/articles/PMC10676303/ /pubmed/37889307 http://dx.doi.org/10.1007/s00439-023-02609-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Investigation
Giuili, Edoardo
Grolaux, Robin
Macedo, Catarina Z. N. M.
Desmyter, Laurence
Pichon, Bruno
Neuens, Sebastian
Vilain, Catheline
Olsen, Catharina
Van Dooren, Sonia
Smits, Guillaume
Defrance, Matthieu
Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)
title Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)
title_full Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)
title_fullStr Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)
title_full_unstemmed Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)
title_short Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs)
title_sort comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (ndds)
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10676303/
https://www.ncbi.nlm.nih.gov/pubmed/37889307
http://dx.doi.org/10.1007/s00439-023-02609-2
work_keys_str_mv AT giuiliedoardo comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT grolauxrobin comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT macedocatarinaznm comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT desmyterlaurence comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT pichonbruno comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT neuenssebastian comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT vilaincatheline comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT olsencatharina comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT vandoorensonia comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT smitsguillaume comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds
AT defrancematthieu comprehensiveevaluationoftheimplementationofepisignaturesfordiagnosisofneurodevelopmentaldisordersndds