Cargando…
MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction
MOTIVATION: Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, n...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710573/ https://www.ncbi.nlm.nih.gov/pubmed/36699388 http://dx.doi.org/10.1093/bioadv/vbac035 |
_version_ | 1784841396286390272 |
---|---|
author | Gu, Wenhao Yang, Xiao Yang, Minhao Han, Kun Pan, Wenying Zhu, Zexuan |
author_facet | Gu, Wenhao Yang, Xiao Yang, Minhao Han, Kun Pan, Wenying Zhu, Zexuan |
author_sort | Gu, Wenhao |
collection | PubMed |
description | MOTIVATION: Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. RESULTS: To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the MarkerGenie program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. AVAILABILITY AND IMPLEMENTATION: MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
format | Online Article Text |
id | pubmed-9710573 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97105732023-01-24 MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction Gu, Wenhao Yang, Xiao Yang, Minhao Han, Kun Pan, Wenying Zhu, Zexuan Bioinform Adv Original Paper MOTIVATION: Natural language processing (NLP) tasks aim to convert unstructured text data (e.g. articles or dialogues) to structured information. In recent years, we have witnessed fundamental advances of NLP technique, which has been widely used in many applications such as financial text mining, news recommendation and machine translation. However, its application in the biomedical space remains challenging due to a lack of labeled data, ambiguities and inconsistencies of biological terminology. In biomedical marker discovery studies, tools that rely on NLP models to automatically and accurately extract relations of biomedical entities are valuable as they can provide a more thorough survey of all available literature, hence providing a less biased result compared to manual curation. In addition, the fast speed of machine reader helps quickly orient research and development. RESULTS: To address the aforementioned needs, we developed automatic training data labeling, rule-based biological terminology cleaning and a more accurate NLP model for binary associative and multi-relation prediction into the MarkerGenie program. We demonstrated the effectiveness of the proposed methods in identifying relations between biomedical entities on various benchmark datasets and case studies. AVAILABILITY AND IMPLEMENTATION: MarkerGenie is available at https://www.genegeniedx.com/markergenie/. Data for model training and evaluation, term lists of biomedical entities, details of the case studies and all trained models are provided at https://drive.google.com/drive/folders/14RypiIfIr3W_K-mNIAx9BNtObHSZoAyn?usp=sharing. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-05-13 /pmc/articles/PMC9710573/ /pubmed/36699388 http://dx.doi.org/10.1093/bioadv/vbac035 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Gu, Wenhao Yang, Xiao Yang, Minhao Han, Kun Pan, Wenying Zhu, Zexuan MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction |
title | MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction |
title_full | MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction |
title_fullStr | MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction |
title_full_unstemmed | MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction |
title_short | MarkerGenie: an NLP-enabled text-mining system for biomedical entity relation extraction |
title_sort | markergenie: an nlp-enabled text-mining system for biomedical entity relation extraction |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710573/ https://www.ncbi.nlm.nih.gov/pubmed/36699388 http://dx.doi.org/10.1093/bioadv/vbac035 |
work_keys_str_mv | AT guwenhao markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction AT yangxiao markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction AT yangminhao markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction AT hankun markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction AT panwenying markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction AT zhuzexuan markergenieannlpenabledtextminingsystemforbiomedicalentityrelationextraction |