Cargando…

A data-fusion approach to identifying developmental dyslexia from multi-omics datasets

This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affec...

Descripción completa

Detalles Bibliográficos
Autores principales: Carrion, Jackson, Nandakumar, Rohit, Shi, Xiaojian, Gu, Haiwei, Kim, Yookyung, Raskind, Wendy H., Peter, Beate, Dinu, Valentin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002702/
https://www.ncbi.nlm.nih.gov/pubmed/36909570
http://dx.doi.org/10.1101/2023.02.27.530280
_version_ 1784904444820848640
author Carrion, Jackson
Nandakumar, Rohit
Shi, Xiaojian
Gu, Haiwei
Kim, Yookyung
Raskind, Wendy H.
Peter, Beate
Dinu, Valentin
author_facet Carrion, Jackson
Nandakumar, Rohit
Shi, Xiaojian
Gu, Haiwei
Kim, Yookyung
Raskind, Wendy H.
Peter, Beate
Dinu, Valentin
author_sort Carrion, Jackson
collection PubMed
description This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affecting roughly 5–10% of the US population. The complex biological and neurological phenotype of this life altering disability complicates its diagnosis. Phenome, exome, and metabolome data was collected allowing us to fully explore this system from a behavioral, cellular, and molecular point of view. This study provides a proof of concept showing that data fusion and ensemble learning techniques can outperform traditional machine learning techniques when provided small and complex multi-omics and clinical datasets. Heterogenous stacking classifiers consisting of single-omic experts/models achieved an accuracy of 86%, F1 score of 0.89, and AUC value of 0.83. Ensemble methods also provided a ranked list of important features that suggests exome single nucleotide polymorphisms found in the thalamus and cerebellum could be potential biomarkers for developmental dyslexia and heavily influenced the classification of DD within our machine learning models.
format Online
Article
Text
id pubmed-10002702
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-100027022023-03-11 A data-fusion approach to identifying developmental dyslexia from multi-omics datasets Carrion, Jackson Nandakumar, Rohit Shi, Xiaojian Gu, Haiwei Kim, Yookyung Raskind, Wendy H. Peter, Beate Dinu, Valentin bioRxiv Article This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affecting roughly 5–10% of the US population. The complex biological and neurological phenotype of this life altering disability complicates its diagnosis. Phenome, exome, and metabolome data was collected allowing us to fully explore this system from a behavioral, cellular, and molecular point of view. This study provides a proof of concept showing that data fusion and ensemble learning techniques can outperform traditional machine learning techniques when provided small and complex multi-omics and clinical datasets. Heterogenous stacking classifiers consisting of single-omic experts/models achieved an accuracy of 86%, F1 score of 0.89, and AUC value of 0.83. Ensemble methods also provided a ranked list of important features that suggests exome single nucleotide polymorphisms found in the thalamus and cerebellum could be potential biomarkers for developmental dyslexia and heavily influenced the classification of DD within our machine learning models. Cold Spring Harbor Laboratory 2023-02-27 /pmc/articles/PMC10002702/ /pubmed/36909570 http://dx.doi.org/10.1101/2023.02.27.530280 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Carrion, Jackson
Nandakumar, Rohit
Shi, Xiaojian
Gu, Haiwei
Kim, Yookyung
Raskind, Wendy H.
Peter, Beate
Dinu, Valentin
A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
title A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
title_full A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
title_fullStr A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
title_full_unstemmed A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
title_short A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
title_sort data-fusion approach to identifying developmental dyslexia from multi-omics datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002702/
https://www.ncbi.nlm.nih.gov/pubmed/36909570
http://dx.doi.org/10.1101/2023.02.27.530280
work_keys_str_mv AT carrionjackson adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT nandakumarrohit adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT shixiaojian adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT guhaiwei adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT kimyookyung adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT raskindwendyh adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT peterbeate adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT dinuvalentin adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT carrionjackson datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT nandakumarrohit datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT shixiaojian datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT guhaiwei datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT kimyookyung datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT raskindwendyh datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT peterbeate datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets
AT dinuvalentin datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets