Cargando…

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes

SIMPLE SUMMARY: Cancer is caused by the accumulation of somatic mutations, some of which are responsible for the disease’s progression (drivers) while others are functionally neutral (passengers). Although several methods have been developed to distinguish between the two classes of mutations, very...

Descripción completa

Detalles Bibliográficos
Autores principales:	Banerjee, Shayantan, Raman, Karthik, Ravindran, Balaraman
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8156421/ https://www.ncbi.nlm.nih.gov/pubmed/34068918 http://dx.doi.org/10.3390/cancers13102366

_version_	1783699441442619392
author	Banerjee, Shayantan Raman, Karthik Ravindran, Balaraman
author_facet	Banerjee, Shayantan Raman, Karthik Ravindran, Balaraman
author_sort	Banerjee, Shayantan
collection	PubMed
description	SIMPLE SUMMARY: Cancer is caused by the accumulation of somatic mutations, some of which are responsible for the disease’s progression (drivers) while others are functionally neutral (passengers). Although several methods have been developed to distinguish between the two classes of mutations, very few have concentrated on using the neighborhood nucleotide sequences as potential discrimination features. In this study, we show that driver mutations’ neighborhood is significantly different from that of passengers. We further develop a novel machine learning tool, NBDriver, which is highly efficient at identifying pathogenic variants from multiple independent test datasets. Efficient and accurate identification of novel pathogenic variants from sequenced cancer genomes would help facilitate more effective therapies tailored to patients’ mutational profiles. ABSTRACT: Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.
format	Online Article Text
id	pubmed-8156421
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-81564212021-05-28 Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes Banerjee, Shayantan Raman, Karthik Ravindran, Balaraman Cancers (Basel) Article SIMPLE SUMMARY: Cancer is caused by the accumulation of somatic mutations, some of which are responsible for the disease’s progression (drivers) while others are functionally neutral (passengers). Although several methods have been developed to distinguish between the two classes of mutations, very few have concentrated on using the neighborhood nucleotide sequences as potential discrimination features. In this study, we show that driver mutations’ neighborhood is significantly different from that of passengers. We further develop a novel machine learning tool, NBDriver, which is highly efficient at identifying pathogenic variants from multiple independent test datasets. Efficient and accurate identification of novel pathogenic variants from sequenced cancer genomes would help facilitate more effective therapies tailored to patients’ mutational profiles. ABSTRACT: Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes. MDPI 2021-05-14 /pmc/articles/PMC8156421/ /pubmed/34068918 http://dx.doi.org/10.3390/cancers13102366 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Banerjee, Shayantan Raman, Karthik Ravindran, Balaraman Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes
title	Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes
title_full	Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes
title_fullStr	Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes
title_full_unstemmed	Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes
title_short	Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes
title_sort	sequence neighborhoods enable reliable prediction of pathogenic mutations in cancer genomes
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8156421/ https://www.ncbi.nlm.nih.gov/pubmed/34068918 http://dx.doi.org/10.3390/cancers13102366
work_keys_str_mv	AT banerjeeshayantan sequenceneighborhoodsenablereliablepredictionofpathogenicmutationsincancergenomes AT ramankarthik sequenceneighborhoodsenablereliablepredictionofpathogenicmutationsincancergenomes AT ravindranbalaraman sequenceneighborhoodsenablereliablepredictionofpathogenicmutationsincancergenomes

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes

Ejemplares similares