Cargando…

Thesaurus-based word embeddings for automated biomedical literature classification

The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Koutsomitropoulos, Dimitrios A., Andriopoulos, Andreas D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2021
Materias:	Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8111057/ https://www.ncbi.nlm.nih.gov/pubmed/33994670 http://dx.doi.org/10.1007/s00521-021-06053-z

_version_	1783690422558654464
author	Koutsomitropoulos, Dimitrios A. Andriopoulos, Andreas D.
author_facet	Koutsomitropoulos, Dimitrios A. Andriopoulos, Andreas D.
author_sort	Koutsomitropoulos, Dimitrios A.
collection	PubMed
description	The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedical text classification even in a multilabel setting, with many distinct labels. The ontology representation of Medical Subject Headings provides machine-readable labels and specifies the dimensionality of the problem space. Both deep- and shallow network approaches are implemented. Predictions are determined by the similarity between extracted features from contextualized representations of abstracts and headings. The addition of a separate classifier for transfer learning is also proposed and evaluated. Large datasets of biomedical citations are harvested for their metadata and used for training and testing. These automated approaches are still far from entirely substituting human experts, yet they can be useful as a mechanism for validation and recommendation. Dataset balancing, distributed processing and training parallelization in GPUs, all play an important part regarding the effectiveness and performance of proposed methods.
format	Online Article Text
id	pubmed-8111057
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-81110572021-05-11 Thesaurus-based word embeddings for automated biomedical literature classification Koutsomitropoulos, Dimitrios A. Andriopoulos, Andreas D. Neural Comput Appl Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedical text classification even in a multilabel setting, with many distinct labels. The ontology representation of Medical Subject Headings provides machine-readable labels and specifies the dimensionality of the problem space. Both deep- and shallow network approaches are implemented. Predictions are determined by the similarity between extracted features from contextualized representations of abstracts and headings. The addition of a separate classifier for transfer learning is also proposed and evaluated. Large datasets of biomedical citations are harvested for their metadata and used for training and testing. These automated approaches are still far from entirely substituting human experts, yet they can be useful as a mechanism for validation and recommendation. Dataset balancing, distributed processing and training parallelization in GPUs, all play an important part regarding the effectiveness and performance of proposed methods. Springer London 2021-05-11 2022 /pmc/articles/PMC8111057/ /pubmed/33994670 http://dx.doi.org/10.1007/s00521-021-06053-z Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution Koutsomitropoulos, Dimitrios A. Andriopoulos, Andreas D. Thesaurus-based word embeddings for automated biomedical literature classification
title	Thesaurus-based word embeddings for automated biomedical literature classification
title_full	Thesaurus-based word embeddings for automated biomedical literature classification
title_fullStr	Thesaurus-based word embeddings for automated biomedical literature classification
title_full_unstemmed	Thesaurus-based word embeddings for automated biomedical literature classification
title_short	Thesaurus-based word embeddings for automated biomedical literature classification
title_sort	thesaurus-based word embeddings for automated biomedical literature classification
topic	Special issue on Advances of Neural Computing phasing challenges in the era of 4th industrial revolution
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8111057/ https://www.ncbi.nlm.nih.gov/pubmed/33994670 http://dx.doi.org/10.1007/s00521-021-06053-z
work_keys_str_mv	AT koutsomitropoulosdimitriosa thesaurusbasedwordembeddingsforautomatedbiomedicalliteratureclassification AT andriopoulosandreasd thesaurusbasedwordembeddingsforautomatedbiomedicalliteratureclassification

Thesaurus-based word embeddings for automated biomedical literature classification

Ejemplares similares