Cargando…

ADHD diagnosis from multiple data sources with batch effects

The Attention Deficit Hyperactivity Disorder (ADHD) affects the school-age population and has large social costs. The scientific community is still lacking a pathophysiological model of the disorder and there are no objective biomarkers to support the diagnosis. In 2011 the ADHD-200 Consortium provi...

Descripción completa

Detalles Bibliográficos
Autores principales: Olivetti, Emanuele, Greiner, Susanne, Avesani, Paolo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465911/
https://www.ncbi.nlm.nih.gov/pubmed/23060755
http://dx.doi.org/10.3389/fnsys.2012.00070
_version_ 1782245610391339008
author Olivetti, Emanuele
Greiner, Susanne
Avesani, Paolo
author_facet Olivetti, Emanuele
Greiner, Susanne
Avesani, Paolo
author_sort Olivetti, Emanuele
collection PubMed
description The Attention Deficit Hyperactivity Disorder (ADHD) affects the school-age population and has large social costs. The scientific community is still lacking a pathophysiological model of the disorder and there are no objective biomarkers to support the diagnosis. In 2011 the ADHD-200 Consortium provided a rich, heterogeneous neuroimaging dataset aimed at studying neural correlates of ADHD and to promote the development of systems for automated diagnosis. Concurrently a competition was set up with the goal of addressing the wide range of different types of data for the accurate prediction of the presence of ADHD. Phenotypic information, structural magnetic resonance imaging (MRI) scans and resting state fMRI recordings were provided for nearly 1000 typical and non-typical young individuals. Data were collected by eight different research centers in the consortium. This work is not concerned with the main task of the contest, i.e., achieving a high prediction accuracy on the competition dataset, but we rather address the proper handling of such a heterogeneous dataset when performing classification-based analysis. Our interest lies in the clustered structure of the data causing the so-called batch effects which have strong impact when assessing the performance of classifiers built on the ADHD-200 dataset. We propose a method to eliminate the biases introduced by such batch effects. Its application on the ADHD-200 dataset generates such a significant drop in prediction accuracy that most of the conclusions from a standard analysis had to be revised. In addition we propose to adopt the dissimilarity representation to set up effective representation spaces for the heterogeneous ADHD-200 dataset. Moreover we propose to evaluate the quality of predictions through a recently proposed test of independence in order to cope with the unbalancedness of the dataset.
format Online
Article
Text
id pubmed-3465911
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-34659112012-10-11 ADHD diagnosis from multiple data sources with batch effects Olivetti, Emanuele Greiner, Susanne Avesani, Paolo Front Syst Neurosci Neuroscience The Attention Deficit Hyperactivity Disorder (ADHD) affects the school-age population and has large social costs. The scientific community is still lacking a pathophysiological model of the disorder and there are no objective biomarkers to support the diagnosis. In 2011 the ADHD-200 Consortium provided a rich, heterogeneous neuroimaging dataset aimed at studying neural correlates of ADHD and to promote the development of systems for automated diagnosis. Concurrently a competition was set up with the goal of addressing the wide range of different types of data for the accurate prediction of the presence of ADHD. Phenotypic information, structural magnetic resonance imaging (MRI) scans and resting state fMRI recordings were provided for nearly 1000 typical and non-typical young individuals. Data were collected by eight different research centers in the consortium. This work is not concerned with the main task of the contest, i.e., achieving a high prediction accuracy on the competition dataset, but we rather address the proper handling of such a heterogeneous dataset when performing classification-based analysis. Our interest lies in the clustered structure of the data causing the so-called batch effects which have strong impact when assessing the performance of classifiers built on the ADHD-200 dataset. We propose a method to eliminate the biases introduced by such batch effects. Its application on the ADHD-200 dataset generates such a significant drop in prediction accuracy that most of the conclusions from a standard analysis had to be revised. In addition we propose to adopt the dissimilarity representation to set up effective representation spaces for the heterogeneous ADHD-200 dataset. Moreover we propose to evaluate the quality of predictions through a recently proposed test of independence in order to cope with the unbalancedness of the dataset. Frontiers Media S.A. 2012-10-08 /pmc/articles/PMC3465911/ /pubmed/23060755 http://dx.doi.org/10.3389/fnsys.2012.00070 Text en Copyright © 2012 Olivetti, Greiner and Avesani. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.
spellingShingle Neuroscience
Olivetti, Emanuele
Greiner, Susanne
Avesani, Paolo
ADHD diagnosis from multiple data sources with batch effects
title ADHD diagnosis from multiple data sources with batch effects
title_full ADHD diagnosis from multiple data sources with batch effects
title_fullStr ADHD diagnosis from multiple data sources with batch effects
title_full_unstemmed ADHD diagnosis from multiple data sources with batch effects
title_short ADHD diagnosis from multiple data sources with batch effects
title_sort adhd diagnosis from multiple data sources with batch effects
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465911/
https://www.ncbi.nlm.nih.gov/pubmed/23060755
http://dx.doi.org/10.3389/fnsys.2012.00070
work_keys_str_mv AT olivettiemanuele adhddiagnosisfrommultipledatasourceswithbatcheffects
AT greinersusanne adhddiagnosisfrommultipledatasourceswithbatcheffects
AT avesanipaolo adhddiagnosisfrommultipledatasourceswithbatcheffects