Cargando…

A probabilistic automated tagger to identify human-related publications

The Medical Subject Heading ‘Humans’ is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad lit...

Descripción completa

Detalles Bibliográficos
Autores principales: Cohen, Aaron M, Dunivin, Zackary O, Smalheiser, Neil R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146117/
https://www.ncbi.nlm.nih.gov/pubmed/30184195
http://dx.doi.org/10.1093/database/bay079
_version_ 1783356343650877440
author Cohen, Aaron M
Dunivin, Zackary O
Smalheiser, Neil R
author_facet Cohen, Aaron M
Dunivin, Zackary O
Smalheiser, Neil R
author_sort Cohen, Aaron M
collection PubMed
description The Medical Subject Heading ‘Humans’ is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987–2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews.
format Online
Article
Text
id pubmed-6146117
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61461172018-09-25 A probabilistic automated tagger to identify human-related publications Cohen, Aaron M Dunivin, Zackary O Smalheiser, Neil R Database (Oxford) Original Article The Medical Subject Heading ‘Humans’ is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987–2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews. Oxford University Press 2018-09-13 /pmc/articles/PMC6146117/ /pubmed/30184195 http://dx.doi.org/10.1093/database/bay079 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Cohen, Aaron M
Dunivin, Zackary O
Smalheiser, Neil R
A probabilistic automated tagger to identify human-related publications
title A probabilistic automated tagger to identify human-related publications
title_full A probabilistic automated tagger to identify human-related publications
title_fullStr A probabilistic automated tagger to identify human-related publications
title_full_unstemmed A probabilistic automated tagger to identify human-related publications
title_short A probabilistic automated tagger to identify human-related publications
title_sort probabilistic automated tagger to identify human-related publications
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146117/
https://www.ncbi.nlm.nih.gov/pubmed/30184195
http://dx.doi.org/10.1093/database/bay079
work_keys_str_mv AT cohenaaronm aprobabilisticautomatedtaggertoidentifyhumanrelatedpublications
AT dunivinzackaryo aprobabilisticautomatedtaggertoidentifyhumanrelatedpublications
AT smalheiserneilr aprobabilisticautomatedtaggertoidentifyhumanrelatedpublications
AT cohenaaronm probabilisticautomatedtaggertoidentifyhumanrelatedpublications
AT dunivinzackaryo probabilisticautomatedtaggertoidentifyhumanrelatedpublications
AT smalheiserneilr probabilisticautomatedtaggertoidentifyhumanrelatedpublications