Cargando…

Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques

Identification of somatic mutations with high precision is one of the major challenges in the prediction of high-risk liver cancer patients. In the past, number of mutations calling techniques has been developed that include MuTect2, MuSE, Varscan2, and SomaticSniper. In this study, an attempt has b...

Descripción completa

Detalles Bibliográficos
Autores principales: Patiyal, Sumeet, Dhall, Anjali, Raghava, Gajendra P S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204470/
https://www.ncbi.nlm.nih.gov/pubmed/35734767
http://dx.doi.org/10.1093/biomethods/bpac012
_version_ 1784728934021070848
author Patiyal, Sumeet
Dhall, Anjali
Raghava, Gajendra P S
author_facet Patiyal, Sumeet
Dhall, Anjali
Raghava, Gajendra P S
author_sort Patiyal, Sumeet
collection PubMed
description Identification of somatic mutations with high precision is one of the major challenges in the prediction of high-risk liver cancer patients. In the past, number of mutations calling techniques has been developed that include MuTect2, MuSE, Varscan2, and SomaticSniper. In this study, an attempt has been made to benchmark the potential of these techniques in predicting the prognostic biomarkers for liver cancer. Initially, we extracted somatic mutations in liver cancer patients using Variant Call Format (VCF) and Mutation Annotation Format (MAF) files from the cancer genome atlas. In terms of size, the MAF files are 42 times smaller than VCF files and containing only high-quality somatic mutations. Furthermore, machine learning-based models have been developed for predicting high-risk cancer patients using mutations obtained from different techniques. The performance of different techniques and data files has been compared based on their potential to discriminate high- and low-risk liver cancer patients. Based on correlation analysis, we selected 80 genes having significant negative correlation with the overall survival of liver cancer patients. The univariate survival analysis revealed the prognostic role of highly mutated genes. Single gene-based analysis showed that MuTect2 technique-based MAF file has achieved maximum hazard ratio (HR(LAMC3)) of 9.25 with P-value of 1.78E-06. Further, we developed various prediction models using risk-associated top-10 genes for each technique. Our results indicate that MuTect2 technique-based VCF files outperform all other methods with maximum Area Under the Receiver-Operating Characteristic curve of 0.765 and HR = 4.50 (P-value = 3.83E-15). Eventually, VCF file generated using MuTect2 technique performs better among other mutation calling techniques for the prediction of high-risk liver cancer patients. We hope that our findings will provide a useful and comprehensive comparison of various mutation-calling techniques for the prognostic analysis of cancer patients. In order to serve the scientific community, we have provided a Python-based pipeline to develop the prediction models using mutation profiles (VCF/MAF) of cancer patients. It is available on GitHub at https://github.com/raghavagps/mutation_bench.
format Online
Article
Text
id pubmed-9204470
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92044702022-06-21 Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques Patiyal, Sumeet Dhall, Anjali Raghava, Gajendra P S Biol Methods Protoc Methods Article Identification of somatic mutations with high precision is one of the major challenges in the prediction of high-risk liver cancer patients. In the past, number of mutations calling techniques has been developed that include MuTect2, MuSE, Varscan2, and SomaticSniper. In this study, an attempt has been made to benchmark the potential of these techniques in predicting the prognostic biomarkers for liver cancer. Initially, we extracted somatic mutations in liver cancer patients using Variant Call Format (VCF) and Mutation Annotation Format (MAF) files from the cancer genome atlas. In terms of size, the MAF files are 42 times smaller than VCF files and containing only high-quality somatic mutations. Furthermore, machine learning-based models have been developed for predicting high-risk cancer patients using mutations obtained from different techniques. The performance of different techniques and data files has been compared based on their potential to discriminate high- and low-risk liver cancer patients. Based on correlation analysis, we selected 80 genes having significant negative correlation with the overall survival of liver cancer patients. The univariate survival analysis revealed the prognostic role of highly mutated genes. Single gene-based analysis showed that MuTect2 technique-based MAF file has achieved maximum hazard ratio (HR(LAMC3)) of 9.25 with P-value of 1.78E-06. Further, we developed various prediction models using risk-associated top-10 genes for each technique. Our results indicate that MuTect2 technique-based VCF files outperform all other methods with maximum Area Under the Receiver-Operating Characteristic curve of 0.765 and HR = 4.50 (P-value = 3.83E-15). Eventually, VCF file generated using MuTect2 technique performs better among other mutation calling techniques for the prediction of high-risk liver cancer patients. We hope that our findings will provide a useful and comprehensive comparison of various mutation-calling techniques for the prognostic analysis of cancer patients. In order to serve the scientific community, we have provided a Python-based pipeline to develop the prediction models using mutation profiles (VCF/MAF) of cancer patients. It is available on GitHub at https://github.com/raghavagps/mutation_bench. Oxford University Press 2022-05-27 /pmc/articles/PMC9204470/ /pubmed/35734767 http://dx.doi.org/10.1093/biomethods/bpac012 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Patiyal, Sumeet
Dhall, Anjali
Raghava, Gajendra P S
Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
title Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
title_full Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
title_fullStr Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
title_full_unstemmed Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
title_short Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
title_sort prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: benchmarking of mutation calling techniques
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204470/
https://www.ncbi.nlm.nih.gov/pubmed/35734767
http://dx.doi.org/10.1093/biomethods/bpac012
work_keys_str_mv AT patiyalsumeet predictionofriskassociatedgenesandhighrisklivercancerpatientsfromtheirmutationprofilebenchmarkingofmutationcallingtechniques
AT dhallanjali predictionofriskassociatedgenesandhighrisklivercancerpatientsfromtheirmutationprofilebenchmarkingofmutationcallingtechniques
AT raghavagajendraps predictionofriskassociatedgenesandhighrisklivercancerpatientsfromtheirmutationprofilebenchmarkingofmutationcallingtechniques