Cargando…

Reducing False-Positive Results in Newborn Screening Using Machine Learning

Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Gang, Tang, Yishuo, Cowan, Tina M., Enns, Gregory M., Zhao, Hongyu, Scharfe, Curt
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7080200/
https://www.ncbi.nlm.nih.gov/pubmed/32190768
http://dx.doi.org/10.3390/ijns6010016
_version_ 1783507979147935744
author Peng, Gang
Tang, Yishuo
Cowan, Tina M.
Enns, Gregory M.
Zhao, Hongyu
Scharfe, Curt
author_facet Peng, Gang
Tang, Yishuo
Cowan, Tina M.
Enns, Gregory M.
Zhao, Hongyu
Scharfe, Curt
author_sort Peng, Gang
collection PubMed
description Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest’s ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening.
format Online
Article
Text
id pubmed-7080200
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-70802002020-03-18 Reducing False-Positive Results in Newborn Screening Using Machine Learning Peng, Gang Tang, Yishuo Cowan, Tina M. Enns, Gregory M. Zhao, Hongyu Scharfe, Curt Int J Neonatal Screen Article Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest’s ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening. MDPI 2020-03-03 /pmc/articles/PMC7080200/ /pubmed/32190768 http://dx.doi.org/10.3390/ijns6010016 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Peng, Gang
Tang, Yishuo
Cowan, Tina M.
Enns, Gregory M.
Zhao, Hongyu
Scharfe, Curt
Reducing False-Positive Results in Newborn Screening Using Machine Learning
title Reducing False-Positive Results in Newborn Screening Using Machine Learning
title_full Reducing False-Positive Results in Newborn Screening Using Machine Learning
title_fullStr Reducing False-Positive Results in Newborn Screening Using Machine Learning
title_full_unstemmed Reducing False-Positive Results in Newborn Screening Using Machine Learning
title_short Reducing False-Positive Results in Newborn Screening Using Machine Learning
title_sort reducing false-positive results in newborn screening using machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7080200/
https://www.ncbi.nlm.nih.gov/pubmed/32190768
http://dx.doi.org/10.3390/ijns6010016
work_keys_str_mv AT penggang reducingfalsepositiveresultsinnewbornscreeningusingmachinelearning
AT tangyishuo reducingfalsepositiveresultsinnewbornscreeningusingmachinelearning
AT cowantinam reducingfalsepositiveresultsinnewbornscreeningusingmachinelearning
AT ennsgregorym reducingfalsepositiveresultsinnewbornscreeningusingmachinelearning
AT zhaohongyu reducingfalsepositiveresultsinnewbornscreeningusingmachinelearning
AT scharfecurt reducingfalsepositiveresultsinnewbornscreeningusingmachinelearning