Cargando…

Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles

OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by i...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Seunghee, Choi, Jinwook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korean Society of Medical Informatics 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3324751/
https://www.ncbi.nlm.nih.gov/pubmed/22509470
http://dx.doi.org/10.4258/hir.2012.18.1.18
_version_ 1782229350283739136
author Kim, Seunghee
Choi, Jinwook
author_facet Kim, Seunghee
Choi, Jinwook
author_sort Kim, Seunghee
collection PubMed
description OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs. METHODS: Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison. RESULTS: On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics. CONCLUSIONS: Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective.
format Online
Article
Text
id pubmed-3324751
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Korean Society of Medical Informatics
record_format MEDLINE/PubMed
spelling pubmed-33247512012-04-16 Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles Kim, Seunghee Choi, Jinwook Healthc Inform Res Original Article OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs. METHODS: Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison. RESULTS: On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics. CONCLUSIONS: Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective. Korean Society of Medical Informatics 2012-03 2012-03-31 /pmc/articles/PMC3324751/ /pubmed/22509470 http://dx.doi.org/10.4258/hir.2012.18.1.18 Text en © 2012 The Korean Society of Medical Informatics http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Kim, Seunghee
Choi, Jinwook
Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
title Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
title_full Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
title_fullStr Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
title_full_unstemmed Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
title_short Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
title_sort improving the performance of text categorization models used for the selection of high quality articles
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3324751/
https://www.ncbi.nlm.nih.gov/pubmed/22509470
http://dx.doi.org/10.4258/hir.2012.18.1.18
work_keys_str_mv AT kimseunghee improvingtheperformanceoftextcategorizationmodelsusedfortheselectionofhighqualityarticles
AT choijinwook improvingtheperformanceoftextcategorizationmodelsusedfortheselectionofhighqualityarticles