Cargando…
A semi-supervised approach using label propagation to support citation screening
Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5726085/ https://www.ncbi.nlm.nih.gov/pubmed/28648605 http://dx.doi.org/10.1016/j.jbi.2017.06.018 |
_version_ | 1783285668398497792 |
---|---|
author | Kontonatsios, Georgios Brockmeier, Austin J. Przybyła, Piotr McNaught, John Mu, Tingting Goulermas, John Y. Ananiadou, Sophia |
author_facet | Kontonatsios, Georgios Brockmeier, Austin J. Przybyła, Piotr McNaught, John Mu, Tingting Goulermas, John Y. Ananiadou, Sophia |
author_sort | Kontonatsios, Georgios |
collection | PubMed |
description | Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews. |
format | Online Article Text |
id | pubmed-5726085 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-57260852017-12-18 A semi-supervised approach using label propagation to support citation screening Kontonatsios, Georgios Brockmeier, Austin J. Przybyła, Piotr McNaught, John Mu, Tingting Goulermas, John Y. Ananiadou, Sophia J Biomed Inform Article Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews. Elsevier 2017-08 /pmc/articles/PMC5726085/ /pubmed/28648605 http://dx.doi.org/10.1016/j.jbi.2017.06.018 Text en © 2017 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kontonatsios, Georgios Brockmeier, Austin J. Przybyła, Piotr McNaught, John Mu, Tingting Goulermas, John Y. Ananiadou, Sophia A semi-supervised approach using label propagation to support citation screening |
title | A semi-supervised approach using label propagation to support citation screening |
title_full | A semi-supervised approach using label propagation to support citation screening |
title_fullStr | A semi-supervised approach using label propagation to support citation screening |
title_full_unstemmed | A semi-supervised approach using label propagation to support citation screening |
title_short | A semi-supervised approach using label propagation to support citation screening |
title_sort | semi-supervised approach using label propagation to support citation screening |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5726085/ https://www.ncbi.nlm.nih.gov/pubmed/28648605 http://dx.doi.org/10.1016/j.jbi.2017.06.018 |
work_keys_str_mv | AT kontonatsiosgeorgios asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT brockmeieraustinj asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT przybyłapiotr asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT mcnaughtjohn asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT mutingting asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT goulermasjohny asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT ananiadousophia asemisupervisedapproachusinglabelpropagationtosupportcitationscreening AT kontonatsiosgeorgios semisupervisedapproachusinglabelpropagationtosupportcitationscreening AT brockmeieraustinj semisupervisedapproachusinglabelpropagationtosupportcitationscreening AT przybyłapiotr semisupervisedapproachusinglabelpropagationtosupportcitationscreening AT mcnaughtjohn semisupervisedapproachusinglabelpropagationtosupportcitationscreening AT mutingting semisupervisedapproachusinglabelpropagationtosupportcitationscreening AT goulermasjohny semisupervisedapproachusinglabelpropagationtosupportcitationscreening AT ananiadousophia semisupervisedapproachusinglabelpropagationtosupportcitationscreening |