Cargando…

Supporting systematic reviews using LDA-based document representations

BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mo, Yuanhan, Kontonatsios, Georgios, Ananiadou, Sophia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4662004/ https://www.ncbi.nlm.nih.gov/pubmed/26612232 http://dx.doi.org/10.1186/s13643-015-0117-0

_version_	1782403095179821056
author	Mo, Yuanhan Kontonatsios, Georgios Ananiadou, Sophia
author_facet	Mo, Yuanhan Kontonatsios, Georgios Ananiadou, Sophia
author_sort	Mo, Yuanhan
collection	PubMed
description	BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4662004
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46620042015-11-28 Supporting systematic reviews using LDA-based document representations Mo, Yuanhan Kontonatsios, Georgios Ananiadou, Sophia Syst Rev Methodology BACKGROUND: Identifying relevant studies for inclusion in a systematic review (i.e. screening) is a complex, laborious and expensive task. Recently, a number of studies has shown that the use of machine learning and text mining methods to automatically identify relevant studies has the potential to drastically decrease the workload involved in the screening phase. The vast majority of these machine learning methods exploit the same underlying principle, i.e. a study is modelled as a bag-of-words (BOW). METHODS: We explore the use of topic modelling methods to derive a more informative representation of studies. We apply Latent Dirichlet allocation (LDA), an unsupervised topic modelling approach, to automatically identify topics in a collection of studies. We then represent each study as a distribution of LDA topics. Additionally, we enrich topics derived using LDA with multi-word terms identified by using an automatic term recognition (ATR) tool. For evaluation purposes, we carry out automatic identification of relevant studies using support vector machine (SVM)-based classifiers that employ both our novel topic-based representation and the BOW representation. RESULTS: Our results show that the SVM classifier is able to identify a greater number of relevant studies when using the LDA representation than the BOW representation. These observations hold for two systematic reviews of the clinical domain and three reviews of the social science domain. CONCLUSIONS: A topic-based feature representation of documents outperforms the BOW representation when applied to the task of automatic citation screening. The proposed term-enriched topics are more informative and less ambiguous to systematic reviewers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13643-015-0117-0) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-26 /pmc/articles/PMC4662004/ /pubmed/26612232 http://dx.doi.org/10.1186/s13643-015-0117-0 Text en © Mo et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Mo, Yuanhan Kontonatsios, Georgios Ananiadou, Sophia Supporting systematic reviews using LDA-based document representations
title	Supporting systematic reviews using LDA-based document representations
title_full	Supporting systematic reviews using LDA-based document representations
title_fullStr	Supporting systematic reviews using LDA-based document representations
title_full_unstemmed	Supporting systematic reviews using LDA-based document representations
title_short	Supporting systematic reviews using LDA-based document representations
title_sort	supporting systematic reviews using lda-based document representations
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4662004/ https://www.ncbi.nlm.nih.gov/pubmed/26612232 http://dx.doi.org/10.1186/s13643-015-0117-0
work_keys_str_mv	AT moyuanhan supportingsystematicreviewsusingldabaseddocumentrepresentations AT kontonatsiosgeorgios supportingsystematicreviewsusingldabaseddocumentrepresentations AT ananiadousophia supportingsystematicreviewsusingldabaseddocumentrepresentations

Supporting systematic reviews using LDA-based document representations

Ejemplares similares