Cargando…
Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles
OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by i...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Korean Society of Medical Informatics
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3324751/ https://www.ncbi.nlm.nih.gov/pubmed/22509470 http://dx.doi.org/10.4258/hir.2012.18.1.18 |
_version_ | 1782229350283739136 |
---|---|
author | Kim, Seunghee Choi, Jinwook |
author_facet | Kim, Seunghee Choi, Jinwook |
author_sort | Kim, Seunghee |
collection | PubMed |
description | OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs. METHODS: Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison. RESULTS: On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics. CONCLUSIONS: Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective. |
format | Online Article Text |
id | pubmed-3324751 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Korean Society of Medical Informatics |
record_format | MEDLINE/PubMed |
spelling | pubmed-33247512012-04-16 Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles Kim, Seunghee Choi, Jinwook Healthc Inform Res Original Article OBJECTIVES: Machine learning systems can considerably reduce the time and effort needed by experts to perform new systematic reviews (SRs). This study investigates categorization models, which are trained on a combination of included and commonly excluded articles, which can improve performance by identifying high quality articles for new procedures or drug SRs. METHODS: Test collections were built using the annotated reference files from 19 procedure and 15 drug systematic reviews. The classification models, using a support vector machine, were trained by the combined even data of other topics, excepting the desired topic. This approach was compared to the combination of included and commonly excluded articles with the combination of included and excluded articles. Accuracy was used for the measure of comparison. RESULTS: On average, the performance was improved by about 15% in the procedure topics and 11% in the drug topics when the classification models trained on the combination of articles included and commonly excluded, were used. The system using the combination of included and commonly excluded articles performed better than the combination of included and excluded articles in all of the procedure topics. CONCLUSIONS: Automatically rigorous article classification using machine learning can reduce the workload of experts when they perform systematic reviews when the topic-specific data are scarce. In particular, when the combination of included and commonly excluded articles is used, this system will be more effective. Korean Society of Medical Informatics 2012-03 2012-03-31 /pmc/articles/PMC3324751/ /pubmed/22509470 http://dx.doi.org/10.4258/hir.2012.18.1.18 Text en © 2012 The Korean Society of Medical Informatics http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Kim, Seunghee Choi, Jinwook Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles |
title | Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles |
title_full | Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles |
title_fullStr | Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles |
title_full_unstemmed | Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles |
title_short | Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles |
title_sort | improving the performance of text categorization models used for the selection of high quality articles |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3324751/ https://www.ncbi.nlm.nih.gov/pubmed/22509470 http://dx.doi.org/10.4258/hir.2012.18.1.18 |
work_keys_str_mv | AT kimseunghee improvingtheperformanceoftextcategorizationmodelsusedfortheselectionofhighqualityarticles AT choijinwook improvingtheperformanceoftextcategorizationmodelsusedfortheselectionofhighqualityarticles |