Cargando…
A data-fusion approach to identifying developmental dyslexia from multi-omics datasets
This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affec...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002702/ https://www.ncbi.nlm.nih.gov/pubmed/36909570 http://dx.doi.org/10.1101/2023.02.27.530280 |
_version_ | 1784904444820848640 |
---|---|
author | Carrion, Jackson Nandakumar, Rohit Shi, Xiaojian Gu, Haiwei Kim, Yookyung Raskind, Wendy H. Peter, Beate Dinu, Valentin |
author_facet | Carrion, Jackson Nandakumar, Rohit Shi, Xiaojian Gu, Haiwei Kim, Yookyung Raskind, Wendy H. Peter, Beate Dinu, Valentin |
author_sort | Carrion, Jackson |
collection | PubMed |
description | This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affecting roughly 5–10% of the US population. The complex biological and neurological phenotype of this life altering disability complicates its diagnosis. Phenome, exome, and metabolome data was collected allowing us to fully explore this system from a behavioral, cellular, and molecular point of view. This study provides a proof of concept showing that data fusion and ensemble learning techniques can outperform traditional machine learning techniques when provided small and complex multi-omics and clinical datasets. Heterogenous stacking classifiers consisting of single-omic experts/models achieved an accuracy of 86%, F1 score of 0.89, and AUC value of 0.83. Ensemble methods also provided a ranked list of important features that suggests exome single nucleotide polymorphisms found in the thalamus and cerebellum could be potential biomarkers for developmental dyslexia and heavily influenced the classification of DD within our machine learning models. |
format | Online Article Text |
id | pubmed-10002702 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-100027022023-03-11 A data-fusion approach to identifying developmental dyslexia from multi-omics datasets Carrion, Jackson Nandakumar, Rohit Shi, Xiaojian Gu, Haiwei Kim, Yookyung Raskind, Wendy H. Peter, Beate Dinu, Valentin bioRxiv Article This exploratory study tested and validated the use of data fusion and machine learning techniques to probe high-throughput omics and clinical data with a goal of exploring the etiology of developmental dyslexia. Developmental dyslexia is the leading learning disability in school aged children affecting roughly 5–10% of the US population. The complex biological and neurological phenotype of this life altering disability complicates its diagnosis. Phenome, exome, and metabolome data was collected allowing us to fully explore this system from a behavioral, cellular, and molecular point of view. This study provides a proof of concept showing that data fusion and ensemble learning techniques can outperform traditional machine learning techniques when provided small and complex multi-omics and clinical datasets. Heterogenous stacking classifiers consisting of single-omic experts/models achieved an accuracy of 86%, F1 score of 0.89, and AUC value of 0.83. Ensemble methods also provided a ranked list of important features that suggests exome single nucleotide polymorphisms found in the thalamus and cerebellum could be potential biomarkers for developmental dyslexia and heavily influenced the classification of DD within our machine learning models. Cold Spring Harbor Laboratory 2023-02-27 /pmc/articles/PMC10002702/ /pubmed/36909570 http://dx.doi.org/10.1101/2023.02.27.530280 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Carrion, Jackson Nandakumar, Rohit Shi, Xiaojian Gu, Haiwei Kim, Yookyung Raskind, Wendy H. Peter, Beate Dinu, Valentin A data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
title | A data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
title_full | A data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
title_fullStr | A data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
title_full_unstemmed | A data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
title_short | A data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
title_sort | data-fusion approach to identifying developmental dyslexia from multi-omics datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002702/ https://www.ncbi.nlm.nih.gov/pubmed/36909570 http://dx.doi.org/10.1101/2023.02.27.530280 |
work_keys_str_mv | AT carrionjackson adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT nandakumarrohit adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT shixiaojian adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT guhaiwei adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT kimyookyung adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT raskindwendyh adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT peterbeate adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT dinuvalentin adatafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT carrionjackson datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT nandakumarrohit datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT shixiaojian datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT guhaiwei datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT kimyookyung datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT raskindwendyh datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT peterbeate datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets AT dinuvalentin datafusionapproachtoidentifyingdevelopmentaldyslexiafrommultiomicsdatasets |