Cargando…

A semi-supervised approach using label propagation to support citation screening

Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is...

Descripción completa

Detalles Bibliográficos
Autores principales: Kontonatsios, Georgios, Brockmeier, Austin J., Przybyła, Piotr, McNaught, John, Mu, Tingting, Goulermas, John Y., Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5726085/
https://www.ncbi.nlm.nih.gov/pubmed/28648605
http://dx.doi.org/10.1016/j.jbi.2017.06.018
_version_ 1783285668398497792
author Kontonatsios, Georgios
Brockmeier, Austin J.
Przybyła, Piotr
McNaught, John
Mu, Tingting
Goulermas, John Y.
Ananiadou, Sophia
author_facet Kontonatsios, Georgios
Brockmeier, Austin J.
Przybyła, Piotr
McNaught, John
Mu, Tingting
Goulermas, John Y.
Ananiadou, Sophia
author_sort Kontonatsios, Georgios
collection PubMed
description Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews.
format Online
Article
Text
id pubmed-5726085
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-57260852017-12-18 A semi-supervised approach using label propagation to support citation screening Kontonatsios, Georgios Brockmeier, Austin J. Przybyła, Piotr McNaught, John Mu, Tingting Goulermas, John Y. Ananiadou, Sophia J Biomed Inform Article Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews. Elsevier 2017-08 /pmc/articles/PMC5726085/ /pubmed/28648605 http://dx.doi.org/10.1016/j.jbi.2017.06.018 Text en © 2017 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kontonatsios, Georgios
Brockmeier, Austin J.
Przybyła, Piotr
McNaught, John
Mu, Tingting
Goulermas, John Y.
Ananiadou, Sophia
A semi-supervised approach using label propagation to support citation screening
title A semi-supervised approach using label propagation to support citation screening
title_full A semi-supervised approach using label propagation to support citation screening
title_fullStr A semi-supervised approach using label propagation to support citation screening
title_full_unstemmed A semi-supervised approach using label propagation to support citation screening
title_short A semi-supervised approach using label propagation to support citation screening
title_sort semi-supervised approach using label propagation to support citation screening
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5726085/
https://www.ncbi.nlm.nih.gov/pubmed/28648605
http://dx.doi.org/10.1016/j.jbi.2017.06.018
work_keys_str_mv AT kontonatsiosgeorgios asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT brockmeieraustinj asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT przybyłapiotr asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT mcnaughtjohn asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT mutingting asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT goulermasjohny asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT ananiadousophia asemisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT kontonatsiosgeorgios semisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT brockmeieraustinj semisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT przybyłapiotr semisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT mcnaughtjohn semisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT mutingting semisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT goulermasjohny semisupervisedapproachusinglabelpropagationtosupportcitationscreening
AT ananiadousophia semisupervisedapproachusinglabelpropagationtosupportcitationscreening