Cargando…

Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery

Developing a biomedical-explainable and validatable text mining pipeline can help in cancer gene panel discovery. We create a pipeline that can contextualize genes by using text-mined co-occurrence features. We apply Biomedical Natural Language Processing (BioNLP) techniques for literature mining in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Hui-O, Lin, Peng-Chan, Liu, Chen-Ruei, Wang, Chi-Shiang, Chiang, Jung-Hsien
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8573063/ https://www.ncbi.nlm.nih.gov/pubmed/34759963 http://dx.doi.org/10.3389/fgene.2021.771435

_version_	1784595339992367104
author	Chen, Hui-O Lin, Peng-Chan Liu, Chen-Ruei Wang, Chi-Shiang Chiang, Jung-Hsien
author_facet	Chen, Hui-O Lin, Peng-Chan Liu, Chen-Ruei Wang, Chi-Shiang Chiang, Jung-Hsien
author_sort	Chen, Hui-O
collection	PubMed
description	Developing a biomedical-explainable and validatable text mining pipeline can help in cancer gene panel discovery. We create a pipeline that can contextualize genes by using text-mined co-occurrence features. We apply Biomedical Natural Language Processing (BioNLP) techniques for literature mining in the cancer gene panel. A literature-derived 4,679 × 4,630 gene term-feature matrix was built. The EGFR L858R and T790M, and BRAF V600E genetic variants are important mutation term features in text mining and are frequently mutated in cancer. We validate the cancer gene panel by the mutational landscape of different cancer types. The cosine similarity of gene frequency between text mining and a statistical result from clinical sequencing data is 80.8%. In different machine learning models, the best accuracy for the prediction of two different gene panels, including MSK-IMPACT (Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets), and Oncomine cancer gene panel, is 0.959, and 0.989, respectively. The receiver operating characteristic (ROC) curve analysis confirmed that the neural net model has a better prediction performance (Area under the ROC curve (AUC) = 0.992). The use of text-mined co-occurrence features can contextualize each gene. We believe the approach is to evaluate several existing gene panels, and show that we can use part of the gene panel set to predict the remaining genes for cancer discovery.
format	Online Article Text
id	pubmed-8573063
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-85730632021-11-09 Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery Chen, Hui-O Lin, Peng-Chan Liu, Chen-Ruei Wang, Chi-Shiang Chiang, Jung-Hsien Front Genet Genetics Developing a biomedical-explainable and validatable text mining pipeline can help in cancer gene panel discovery. We create a pipeline that can contextualize genes by using text-mined co-occurrence features. We apply Biomedical Natural Language Processing (BioNLP) techniques for literature mining in the cancer gene panel. A literature-derived 4,679 × 4,630 gene term-feature matrix was built. The EGFR L858R and T790M, and BRAF V600E genetic variants are important mutation term features in text mining and are frequently mutated in cancer. We validate the cancer gene panel by the mutational landscape of different cancer types. The cosine similarity of gene frequency between text mining and a statistical result from clinical sequencing data is 80.8%. In different machine learning models, the best accuracy for the prediction of two different gene panels, including MSK-IMPACT (Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets), and Oncomine cancer gene panel, is 0.959, and 0.989, respectively. The receiver operating characteristic (ROC) curve analysis confirmed that the neural net model has a better prediction performance (Area under the ROC curve (AUC) = 0.992). The use of text-mined co-occurrence features can contextualize each gene. We believe the approach is to evaluate several existing gene panels, and show that we can use part of the gene panel set to predict the remaining genes for cancer discovery. Frontiers Media S.A. 2021-10-25 /pmc/articles/PMC8573063/ /pubmed/34759963 http://dx.doi.org/10.3389/fgene.2021.771435 Text en Copyright © 2021 Chen, Lin, Liu, Wang and Chiang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Chen, Hui-O Lin, Peng-Chan Liu, Chen-Ruei Wang, Chi-Shiang Chiang, Jung-Hsien Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery
title	Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery
title_full	Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery
title_fullStr	Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery
title_full_unstemmed	Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery
title_short	Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery
title_sort	contextualizing genes by using text-mined co-occurrence features for cancer gene panel discovery
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8573063/ https://www.ncbi.nlm.nih.gov/pubmed/34759963 http://dx.doi.org/10.3389/fgene.2021.771435
work_keys_str_mv	AT chenhuio contextualizinggenesbyusingtextminedcooccurrencefeaturesforcancergenepaneldiscovery AT linpengchan contextualizinggenesbyusingtextminedcooccurrencefeaturesforcancergenepaneldiscovery AT liuchenruei contextualizinggenesbyusingtextminedcooccurrencefeaturesforcancergenepaneldiscovery AT wangchishiang contextualizinggenesbyusingtextminedcooccurrencefeaturesforcancergenepaneldiscovery AT chiangjunghsien contextualizinggenesbyusingtextminedcooccurrencefeaturesforcancergenepaneldiscovery

Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery

Ejemplares similares