Cargando…

ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address...

Descripción completa

Detalles Bibliográficos
Autores principales: Xing, Yuting, Wu, Chengkun, Yang, Xi, Wang, Wei, Zhu, En, Yin, Jianping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099625/
https://www.ncbi.nlm.nih.gov/pubmed/29702574
http://dx.doi.org/10.3390/molecules23051028
_version_ 1783348708755111936
author Xing, Yuting
Wu, Chengkun
Yang, Xi
Wang, Wei
Zhu, En
Yin, Jianping
author_facet Xing, Yuting
Wu, Chengkun
Yang, Xi
Wang, Wei
Zhu, En
Yin, Jianping
author_sort Xing, Yuting
collection PubMed
description A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
format Online
Article
Text
id pubmed-6099625
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-60996252018-11-13 ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers Xing, Yuting Wu, Chengkun Yang, Xi Wang, Wei Zhu, En Yin, Jianping Molecules Article A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER. MDPI 2018-04-27 /pmc/articles/PMC6099625/ /pubmed/29702574 http://dx.doi.org/10.3390/molecules23051028 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xing, Yuting
Wu, Chengkun
Yang, Xi
Wang, Wei
Zhu, En
Yin, Jianping
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers
title ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers
title_full ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers
title_fullStr ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers
title_full_unstemmed ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers
title_short ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers
title_sort parabtm: a parallel processing framework for biomedical text mining on supercomputers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099625/
https://www.ncbi.nlm.nih.gov/pubmed/29702574
http://dx.doi.org/10.3390/molecules23051028
work_keys_str_mv AT xingyuting parabtmaparallelprocessingframeworkforbiomedicaltextminingonsupercomputers
AT wuchengkun parabtmaparallelprocessingframeworkforbiomedicaltextminingonsupercomputers
AT yangxi parabtmaparallelprocessingframeworkforbiomedicaltextminingonsupercomputers
AT wangwei parabtmaparallelprocessingframeworkforbiomedicaltextminingonsupercomputers
AT zhuen parabtmaparallelprocessingframeworkforbiomedicaltextminingonsupercomputers
AT yinjianping parabtmaparallelprocessingframeworkforbiomedicaltextminingonsupercomputers