Cargando…

Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

BACKGROUND: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several know...

Descripción completa

Detalles Bibliográficos
Autores principales: Lever, Jake, Jones, Martin R., Danos, Arpad M., Krysiak, Kilannin, Bonakdar, Melika, Grewal, Jasleen K., Culibrk, Luka, Griffith, Obi L., Griffith, Malachi, Jones, Steven J. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6891984/
https://www.ncbi.nlm.nih.gov/pubmed/31796060
http://dx.doi.org/10.1186/s13073-019-0686-y
_version_ 1783475938928885760
author Lever, Jake
Jones, Martin R.
Danos, Arpad M.
Krysiak, Kilannin
Bonakdar, Melika
Grewal, Jasleen K.
Culibrk, Luka
Griffith, Obi L.
Griffith, Malachi
Jones, Steven J. M.
author_facet Lever, Jake
Jones, Martin R.
Danos, Arpad M.
Krysiak, Kilannin
Bonakdar, Melika
Grewal, Jasleen K.
Culibrk, Luka
Griffith, Obi L.
Griffith, Malachi
Jones, Steven J. M.
author_sort Lever, Jake
collection PubMed
description BACKGROUND: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. METHODS: To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. RESULTS: We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. CONCLUSIONS: Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/.
format Online
Article
Text
id pubmed-6891984
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68919842019-12-11 Text-mining clinically relevant cancer biomarkers for curation into the CIViC database Lever, Jake Jones, Martin R. Danos, Arpad M. Krysiak, Kilannin Bonakdar, Melika Grewal, Jasleen K. Culibrk, Luka Griffith, Obi L. Griffith, Malachi Jones, Steven J. M. Genome Med Research BACKGROUND: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. METHODS: To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. RESULTS: We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. CONCLUSIONS: Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/. BioMed Central 2019-12-03 /pmc/articles/PMC6891984/ /pubmed/31796060 http://dx.doi.org/10.1186/s13073-019-0686-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lever, Jake
Jones, Martin R.
Danos, Arpad M.
Krysiak, Kilannin
Bonakdar, Melika
Grewal, Jasleen K.
Culibrk, Luka
Griffith, Obi L.
Griffith, Malachi
Jones, Steven J. M.
Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
title Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
title_full Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
title_fullStr Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
title_full_unstemmed Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
title_short Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
title_sort text-mining clinically relevant cancer biomarkers for curation into the civic database
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6891984/
https://www.ncbi.nlm.nih.gov/pubmed/31796060
http://dx.doi.org/10.1186/s13073-019-0686-y
work_keys_str_mv AT leverjake textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT jonesmartinr textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT danosarpadm textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT krysiakkilannin textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT bonakdarmelika textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT grewaljasleenk textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT culibrkluka textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT griffithobil textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT griffithmalachi textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase
AT jonesstevenjm textminingclinicallyrelevantcancerbiomarkersforcurationintothecivicdatabase