Cargando…

Thesaurus-based word embeddings for automated biomedical literature classification

The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedi...

Descripción completa

Detalles Bibliográficos
Autores principales: Koutsomitropoulos, Dimitrios A., Andriopoulos, Andreas D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer London 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8111057/
https://www.ncbi.nlm.nih.gov/pubmed/33994670
http://dx.doi.org/10.1007/s00521-021-06053-z
_version_ 1783690422558654464
author Koutsomitropoulos, Dimitrios A.
Andriopoulos, Andreas D.
author_facet Koutsomitropoulos, Dimitrios A.
Andriopoulos, Andreas D.
author_sort Koutsomitropoulos, Dimitrios A.
collection PubMed
description The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedical text classification even in a multilabel setting, with many distinct labels. The ontology representation of Medical Subject Headings provides machine-readable labels and specifies the dimensionality of the problem space. Both deep- and shallow network approaches are implemented. Predictions are determined by the similarity between extracted features from contextualized representations of abstracts and headings. The addition of a separate classifier for transfer learning is also proposed and evaluated. Large datasets of biomedical citations are harvested for their metadata and used for training and testing. These automated approaches are still far from entirely substituting human experts, yet they can be useful as a mechanism for validation and recommendation. Dataset balancing, distributed processing and training parallelization in GPUs, all play an important part regarding the effectiveness and performance of proposed methods.
format Online
Article
Text
id pubmed-8111057
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer London
record_format MEDLINE/PubMed
spelling pubmed-81110572021-05-11 Thesaurus-based word embeddings for automated biomedical literature classification Koutsomitropoulos, Dimitrios A. Andriopoulos, Andreas D. Neural Comput Appl Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedical text classification even in a multilabel setting, with many distinct labels. The ontology representation of Medical Subject Headings provides machine-readable labels and specifies the dimensionality of the problem space. Both deep- and shallow network approaches are implemented. Predictions are determined by the similarity between extracted features from contextualized representations of abstracts and headings. The addition of a separate classifier for transfer learning is also proposed and evaluated. Large datasets of biomedical citations are harvested for their metadata and used for training and testing. These automated approaches are still far from entirely substituting human experts, yet they can be useful as a mechanism for validation and recommendation. Dataset balancing, distributed processing and training parallelization in GPUs, all play an important part regarding the effectiveness and performance of proposed methods. Springer London 2021-05-11 2022 /pmc/articles/PMC8111057/ /pubmed/33994670 http://dx.doi.org/10.1007/s00521-021-06053-z Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution
Koutsomitropoulos, Dimitrios A.
Andriopoulos, Andreas D.
Thesaurus-based word embeddings for automated biomedical literature classification
title Thesaurus-based word embeddings for automated biomedical literature classification
title_full Thesaurus-based word embeddings for automated biomedical literature classification
title_fullStr Thesaurus-based word embeddings for automated biomedical literature classification
title_full_unstemmed Thesaurus-based word embeddings for automated biomedical literature classification
title_short Thesaurus-based word embeddings for automated biomedical literature classification
title_sort thesaurus-based word embeddings for automated biomedical literature classification
topic Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8111057/
https://www.ncbi.nlm.nih.gov/pubmed/33994670
http://dx.doi.org/10.1007/s00521-021-06053-z
work_keys_str_mv AT koutsomitropoulosdimitriosa thesaurusbasedwordembeddingsforautomatedbiomedicalliteratureclassification
AT andriopoulosandreasd thesaurusbasedwordembeddingsforautomatedbiomedicalliteratureclassification