Cargando…

Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models

BACKGROUND: In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013–2017), a challenge concerned with biomedical semantic indexing and questio...

Descripción completa

Detalles Bibliográficos
Autores principales: Papanikolaou, Yannis, Tsoumakas, Grigorios, Laliotis, Manos, Markantonatos, Nikos, Vlahavas, Ioannis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610407/
https://www.ncbi.nlm.nih.gov/pubmed/28938902
http://dx.doi.org/10.1186/s13326-017-0150-0
_version_ 1783265772467912704
author Papanikolaou, Yannis
Tsoumakas, Grigorios
Laliotis, Manos
Markantonatos, Nikos
Vlahavas, Ioannis
author_facet Papanikolaou, Yannis
Tsoumakas, Grigorios
Laliotis, Manos
Markantonatos, Nikos
Vlahavas, Ioannis
author_sort Papanikolaou, Yannis
collection PubMed
description BACKGROUND: In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013–2017), a challenge concerned with biomedical semantic indexing and question answering. METHODS: Our main contribution is a MUlti-Label Ensemble method (MULE) that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ’s super-set, the PubMed articles collection) and the proper parametrization of the algorithms used to deal with this challenging classification task. RESULTS: The ensemble method that we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. In our participation in the BioASQ challenge we obtained the first place in 2013 and the second place in the four following years, steadily outperforming MTI, the indexing system of the National Library of Medicine (NLM). CONCLUSIONS: The results of our experimental comparisons, suggest that employing a statistical significance test to validate the ensemble method’s choices, is the optimal approach for ensembling multi-label classifiers, especially in contexts with many rare labels.
format Online
Article
Text
id pubmed-5610407
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56104072017-10-10 Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models Papanikolaou, Yannis Tsoumakas, Grigorios Laliotis, Manos Markantonatos, Nikos Vlahavas, Ioannis J Biomed Semantics Research BACKGROUND: In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013–2017), a challenge concerned with biomedical semantic indexing and question answering. METHODS: Our main contribution is a MUlti-Label Ensemble method (MULE) that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ’s super-set, the PubMed articles collection) and the proper parametrization of the algorithms used to deal with this challenging classification task. RESULTS: The ensemble method that we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. In our participation in the BioASQ challenge we obtained the first place in 2013 and the second place in the four following years, steadily outperforming MTI, the indexing system of the National Library of Medicine (NLM). CONCLUSIONS: The results of our experimental comparisons, suggest that employing a statistical significance test to validate the ensemble method’s choices, is the optimal approach for ensembling multi-label classifiers, especially in contexts with many rare labels. BioMed Central 2017-09-22 /pmc/articles/PMC5610407/ /pubmed/28938902 http://dx.doi.org/10.1186/s13326-017-0150-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Papanikolaou, Yannis
Tsoumakas, Grigorios
Laliotis, Manos
Markantonatos, Nikos
Vlahavas, Ioannis
Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
title Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
title_full Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
title_fullStr Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
title_full_unstemmed Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
title_short Large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
title_sort large-scale online semantic indexing of biomedical articles via an ensemble of multi-label classification models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610407/
https://www.ncbi.nlm.nih.gov/pubmed/28938902
http://dx.doi.org/10.1186/s13326-017-0150-0
work_keys_str_mv AT papanikolaouyannis largescaleonlinesemanticindexingofbiomedicalarticlesviaanensembleofmultilabelclassificationmodels
AT tsoumakasgrigorios largescaleonlinesemanticindexingofbiomedicalarticlesviaanensembleofmultilabelclassificationmodels
AT laliotismanos largescaleonlinesemanticindexingofbiomedicalarticlesviaanensembleofmultilabelclassificationmodels
AT markantonatosnikos largescaleonlinesemanticindexingofbiomedicalarticlesviaanensembleofmultilabelclassificationmodels
AT vlahavasioannis largescaleonlinesemanticindexingofbiomedicalarticlesviaanensembleofmultilabelclassificationmodels