Cargando…
RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion
Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–dis...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8256824/ https://www.ncbi.nlm.nih.gov/pubmed/34235433 http://dx.doi.org/10.1093/nargab/lqab062 |
_version_ | 1783718174953308160 |
---|---|
author | Su, Junhao Wu, Ye Ting, Hing-Fung Lam, Tak-Wah Luo, Ruibang |
author_facet | Su, Junhao Wu, Ye Ting, Hing-Fung Lam, Tak-Wah Luo, Ruibang |
author_sort | Su, Junhao |
collection | PubMed |
description | Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub. |
format | Online Article Text |
id | pubmed-8256824 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-82568242021-07-06 RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion Su, Junhao Wu, Ye Ting, Hing-Fung Lam, Tak-Wah Luo, Ruibang NAR Genom Bioinform Methart Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub. Oxford University Press 2021-07-05 /pmc/articles/PMC8256824/ /pubmed/34235433 http://dx.doi.org/10.1093/nargab/lqab062 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methart Su, Junhao Wu, Ye Ting, Hing-Fung Lam, Tak-Wah Luo, Ruibang RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
title | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
title_full | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
title_fullStr | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
title_full_unstemmed | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
title_short | RENET2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
title_sort | renet2: high-performance full-text gene–disease relation extraction with iterative training data expansion |
topic | Methart |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8256824/ https://www.ncbi.nlm.nih.gov/pubmed/34235433 http://dx.doi.org/10.1093/nargab/lqab062 |
work_keys_str_mv | AT sujunhao renet2highperformancefulltextgenediseaserelationextractionwithiterativetrainingdataexpansion AT wuye renet2highperformancefulltextgenediseaserelationextractionwithiterativetrainingdataexpansion AT tinghingfung renet2highperformancefulltextgenediseaserelationextractionwithiterativetrainingdataexpansion AT lamtakwah renet2highperformancefulltextgenediseaserelationextractionwithiterativetrainingdataexpansion AT luoruibang renet2highperformancefulltextgenediseaserelationextractionwithiterativetrainingdataexpansion |