Cargando…

BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations

Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Kyubum, Lee, Sunwon, Park, Sungjoon, Kim, Sunkyu, Kim, Suhkyung, Choi, Kwanghun, Tan, Aik Choon, Kang, Jaewoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4830473/
https://www.ncbi.nlm.nih.gov/pubmed/27074804
http://dx.doi.org/10.1093/database/baw043
_version_ 1782426901951807488
author Lee, Kyubum
Lee, Sunwon
Park, Sungjoon
Kim, Sunkyu
Kim, Suhkyung
Choi, Kwanghun
Tan, Aik Choon
Kang, Jaewoo
author_facet Lee, Kyubum
Lee, Sunwon
Park, Sungjoon
Kim, Sunkyu
Kim, Suhkyung
Choi, Kwanghun
Tan, Aik Choon
Kang, Jaewoo
author_sort Lee, Kyubum
collection PubMed
description Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance. Database URL: http://infos.korea.ac.kr/bronco
format Online
Article
Text
id pubmed-4830473
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48304732016-04-14 BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations Lee, Kyubum Lee, Sunwon Park, Sungjoon Kim, Sunkyu Kim, Suhkyung Choi, Kwanghun Tan, Aik Choon Kang, Jaewoo Database (Oxford) Original Article Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information from the literature. Many researchers focus on creating an improved automated biomedical natural language processing (BioNLP) method that extracts useful variants and their functional information from the literature. However, there is no gold-standard data set that contains texts annotated with variants and their related functions. To overcome these limitations, we introduce a Biomedical entity Relation ONcology COrpus (BRONCO) that contains more than 400 variants and their relations with genes, diseases, drugs and cell lines in the context of cancer and anti-tumor drug screening research. The variants and their relations were manually extracted from 108 full-text articles. BRONCO can be utilized to evaluate and train new methods used for extracting biomedical entity relations from full-text publications, and thus be a valuable resource to the biomedical text mining research community. Using BRONCO, we quantitatively and qualitatively evaluated the performance of three state-of-the-art BioNLP methods. We also identified their shortcomings, and suggested remedies for each method. We implemented post-processing modules for the three BioNLP methods, which improved their performance. Database URL: http://infos.korea.ac.kr/bronco Oxford University Press 2016-04-13 /pmc/articles/PMC4830473/ /pubmed/27074804 http://dx.doi.org/10.1093/database/baw043 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Lee, Kyubum
Lee, Sunwon
Park, Sungjoon
Kim, Sunkyu
Kim, Suhkyung
Choi, Kwanghun
Tan, Aik Choon
Kang, Jaewoo
BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
title BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
title_full BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
title_fullStr BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
title_full_unstemmed BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
title_short BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
title_sort bronco: biomedical entity relation oncology corpus for extracting gene-variant-disease-drug relations
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4830473/
https://www.ncbi.nlm.nih.gov/pubmed/27074804
http://dx.doi.org/10.1093/database/baw043
work_keys_str_mv AT leekyubum broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT leesunwon broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT parksungjoon broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT kimsunkyu broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT kimsuhkyung broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT choikwanghun broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT tanaikchoon broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations
AT kangjaewoo broncobiomedicalentityrelationoncologycorpusforextractinggenevariantdiseasedrugrelations