Cargando…
A probabilistic automated tagger to identify human-related publications
The Medical Subject Heading ‘Humans’ is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad lit...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146117/ https://www.ncbi.nlm.nih.gov/pubmed/30184195 http://dx.doi.org/10.1093/database/bay079 |
_version_ | 1783356343650877440 |
---|---|
author | Cohen, Aaron M Dunivin, Zackary O Smalheiser, Neil R |
author_facet | Cohen, Aaron M Dunivin, Zackary O Smalheiser, Neil R |
author_sort | Cohen, Aaron M |
collection | PubMed |
description | The Medical Subject Heading ‘Humans’ is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987–2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews. |
format | Online Article Text |
id | pubmed-6146117 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61461172018-09-25 A probabilistic automated tagger to identify human-related publications Cohen, Aaron M Dunivin, Zackary O Smalheiser, Neil R Database (Oxford) Original Article The Medical Subject Heading ‘Humans’ is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987–2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews. Oxford University Press 2018-09-13 /pmc/articles/PMC6146117/ /pubmed/30184195 http://dx.doi.org/10.1093/database/bay079 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Cohen, Aaron M Dunivin, Zackary O Smalheiser, Neil R A probabilistic automated tagger to identify human-related publications |
title | A probabilistic automated tagger to identify human-related publications |
title_full | A probabilistic automated tagger to identify human-related publications |
title_fullStr | A probabilistic automated tagger to identify human-related publications |
title_full_unstemmed | A probabilistic automated tagger to identify human-related publications |
title_short | A probabilistic automated tagger to identify human-related publications |
title_sort | probabilistic automated tagger to identify human-related publications |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146117/ https://www.ncbi.nlm.nih.gov/pubmed/30184195 http://dx.doi.org/10.1093/database/bay079 |
work_keys_str_mv | AT cohenaaronm aprobabilisticautomatedtaggertoidentifyhumanrelatedpublications AT dunivinzackaryo aprobabilisticautomatedtaggertoidentifyhumanrelatedpublications AT smalheiserneilr aprobabilisticautomatedtaggertoidentifyhumanrelatedpublications AT cohenaaronm probabilisticautomatedtaggertoidentifyhumanrelatedpublications AT dunivinzackaryo probabilisticautomatedtaggertoidentifyhumanrelatedpublications AT smalheiserneilr probabilisticautomatedtaggertoidentifyhumanrelatedpublications |