Cargando…

Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining

PURPOSE: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. METHODS: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire...

Descripción completa

Detalles Bibliográficos
Autores principales: Wallace, Byron C., Small, Kevin, Brodley, Carla E., Lau, Joseph, Schmid, Christopher H., Bertram, Lars, Lill, Christina M., Cohen, Joshua T., Trikalinos, Thomas A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908550/
https://www.ncbi.nlm.nih.gov/pubmed/22481134
http://dx.doi.org/10.1038/gim.2012.7
_version_ 1782301722575634432
author Wallace, Byron C.
Small, Kevin
Brodley, Carla E.
Lau, Joseph
Schmid, Christopher H.
Bertram, Lars
Lill, Christina M.
Cohen, Joshua T.
Trikalinos, Thomas A.
author_facet Wallace, Byron C.
Small, Kevin
Brodley, Carla E.
Lau, Joseph
Schmid, Christopher H.
Bertram, Lars
Lill, Christina M.
Cohen, Joshua T.
Trikalinos, Thomas A.
author_sort Wallace, Byron C.
collection PubMed
description PURPOSE: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. METHODS: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to classify citations published in 2010 as “relevant” or “irrelevant” using human screening as the gold standard. RESULTS: Classification models did not miss any of the 104, 65, and 179 eligible citations in PDGene, AlzGene, and SzGene, respectively, and missed only 1 of 79 in the CEA Registry (100% sensitivity for the first three and 99% for the fourth). The respective specificities were 90, 93, 90, and 73%. Had the semiautomated system been used in 2010, a human would have needed to read only 605/5,616 citations to update the PDGene registry (11%) and 555/7,298 (8%), 717/5,381 (13%), and 334/1,015 (33%) for the other three databases. CONCLUSION: Data mining methodologies can reduce the burden of updating systematic reviews, without missing more papers than humans.
format Online
Article
Text
id pubmed-3908550
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-39085502014-02-03 Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining Wallace, Byron C. Small, Kevin Brodley, Carla E. Lau, Joseph Schmid, Christopher H. Bertram, Lars Lill, Christina M. Cohen, Joshua T. Trikalinos, Thomas A. Genet Med Original Research Article PURPOSE: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. METHODS: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to classify citations published in 2010 as “relevant” or “irrelevant” using human screening as the gold standard. RESULTS: Classification models did not miss any of the 104, 65, and 179 eligible citations in PDGene, AlzGene, and SzGene, respectively, and missed only 1 of 79 in the CEA Registry (100% sensitivity for the first three and 99% for the fourth). The respective specificities were 90, 93, 90, and 73%. Had the semiautomated system been used in 2010, a human would have needed to read only 605/5,616 citations to update the PDGene registry (11%) and 555/7,298 (8%), 717/5,381 (13%), and 334/1,015 (33%) for the other three databases. CONCLUSION: Data mining methodologies can reduce the burden of updating systematic reviews, without missing more papers than humans. Nature Publishing Group 2012-07 2012-04-05 /pmc/articles/PMC3908550/ /pubmed/22481134 http://dx.doi.org/10.1038/gim.2012.7 Text en Copyright © 2012 American College of Medical Genetics and Genomics http://creativecommons.org/licenses/by-nc-nd/3.0/ This work is licensed under the Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
spellingShingle Original Research Article
Wallace, Byron C.
Small, Kevin
Brodley, Carla E.
Lau, Joseph
Schmid, Christopher H.
Bertram, Lars
Lill, Christina M.
Cohen, Joshua T.
Trikalinos, Thomas A.
Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
title Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
title_full Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
title_fullStr Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
title_full_unstemmed Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
title_short Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
title_sort toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908550/
https://www.ncbi.nlm.nih.gov/pubmed/22481134
http://dx.doi.org/10.1038/gim.2012.7
work_keys_str_mv AT wallacebyronc towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT smallkevin towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT brodleycarlae towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT laujoseph towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT schmidchristopherh towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT bertramlars towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT lillchristinam towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT cohenjoshuat towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining
AT trikalinosthomasa towardmodernizingthesystematicreviewpipelineingeneticsefficientupdatingviadatamining