Cargando…

Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data

OBJECTIVES: Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health...

Descripción completa

Detalles Bibliográficos
Autores principales: Ter-Minassian, Lucile, Viani, Natalia, Wickersham, Alice, Cross, Lauren, Stewart, Robert, Velupillai, Sumithra, Downs, Johnny
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723859/
https://www.ncbi.nlm.nih.gov/pubmed/36576182
http://dx.doi.org/10.1136/bmjopen-2021-058058
_version_ 1784844277605466112
author Ter-Minassian, Lucile
Viani, Natalia
Wickersham, Alice
Cross, Lauren
Stewart, Robert
Velupillai, Sumithra
Downs, Johnny
author_facet Ter-Minassian, Lucile
Viani, Natalia
Wickersham, Alice
Cross, Lauren
Stewart, Robert
Velupillai, Sumithra
Downs, Johnny
author_sort Ter-Minassian, Lucile
collection PubMed
description OBJECTIVES: Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD. DESIGN: Retrospective population cohort study. SETTING: South London (2007–2013). PARTICIPANTS: n=56 258 pupils with linked education and health data. PRIMARY OUTCOME MEASURES: Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm. RESULTS: Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy. CONCLUSIONS: ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing ‘fairness weighting’ attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups.
format Online
Article
Text
id pubmed-9723859
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-97238592022-12-07 Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data Ter-Minassian, Lucile Viani, Natalia Wickersham, Alice Cross, Lauren Stewart, Robert Velupillai, Sumithra Downs, Johnny BMJ Open Mental Health OBJECTIVES: Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD. DESIGN: Retrospective population cohort study. SETTING: South London (2007–2013). PARTICIPANTS: n=56 258 pupils with linked education and health data. PRIMARY OUTCOME MEASURES: Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm. RESULTS: Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy. CONCLUSIONS: ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing ‘fairness weighting’ attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups. BMJ Publishing Group 2022-12-05 /pmc/articles/PMC9723859/ /pubmed/36576182 http://dx.doi.org/10.1136/bmjopen-2021-058058 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY. Published by BMJ. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
spellingShingle Mental Health
Ter-Minassian, Lucile
Viani, Natalia
Wickersham, Alice
Cross, Lauren
Stewart, Robert
Velupillai, Sumithra
Downs, Johnny
Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_full Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_fullStr Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_full_unstemmed Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_short Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_sort assessing machine learning for fair prediction of adhd in school pupils using a retrospective cohort study of linked education and healthcare data
topic Mental Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723859/
https://www.ncbi.nlm.nih.gov/pubmed/36576182
http://dx.doi.org/10.1136/bmjopen-2021-058058
work_keys_str_mv AT terminassianlucile assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT vianinatalia assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT wickershamalice assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT crosslauren assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT stewartrobert assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT velupillaisumithra assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT downsjohnny assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata