Cargando…

MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction

MOTIVATION: Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, n...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Wenhao, Yang, Xiao, Yang, Minhao, Han, Kun, Pan, Wenying, Zhu, Zexuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710573/
https://www.ncbi.nlm.nih.gov/pubmed/36699388
http://dx.doi.org/10.1093/bioadv/vbac035
_version_ 1784841396286390272
author Gu, Wenhao
Yang, Xiao
Yang, Minhao
Han, Kun
Pan, Wenying
Zhu, Zexuan
author_facet Gu, Wenhao
Yang, Xiao
Yang, Minhao
Han, Kun
Pan, Wenying
Zhu, Zexuan
author_sort Gu, Wenhao
collection PubMed
description MOTIVATION: Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. RESULTS: To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the MarkerGenie program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. AVAILABILITY AND IMPLEMENTATION: MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9710573
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97105732023-01-24 MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction Gu, Wenhao Yang, Xiao Yang, Minhao Han, Kun Pan, Wenying Zhu, Zexuan Bioinform Adv Original Paper MOTIVATION: Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. RESULTS: To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the MarkerGenie program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. AVAILABILITY AND IMPLEMENTATION: MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-05-13 /pmc/articles/PMC9710573/ /pubmed/36699388 http://dx.doi.org/10.1093/bioadv/vbac035 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Gu, Wenhao
Yang, Xiao
Yang, Minhao
Han, Kun
Pan, Wenying
Zhu, Zexuan
MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
title MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
title_full MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
title_fullStr MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
title_full_unstemmed MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
title_short MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
title_sort markergenie: an nlp-enabled text-mining system for biomedical entity relation extraction
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710573/
https://www.ncbi.nlm.nih.gov/pubmed/36699388
http://dx.doi.org/10.1093/bioadv/vbac035
work_keys_str_mv AT guwenhao markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction
AT yangxiao markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction
AT yangminhao markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction
AT hankun markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction
AT panwenying markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction
AT zhuzexuan markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction