Cargando…

Comprehensive and realistic simulation of tumour genomic sequencing data

Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive ‘ground truth’ data, assessing the accuracy of these methods is challen...

Descripción completa

Detalles Bibliográficos
Autores principales: O’Sullivan, Brian, Seoighe, Cathal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516706/
https://www.ncbi.nlm.nih.gov/pubmed/37746635
http://dx.doi.org/10.1093/narcan/zcad051
_version_ 1785109183680479232
author O’Sullivan, Brian
Seoighe, Cathal
author_facet O’Sullivan, Brian
Seoighe, Cathal
author_sort O’Sullivan, Brian
collection PubMed
description Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive ‘ground truth’ data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples.
format Online
Article
Text
id pubmed-10516706
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105167062023-09-23 Comprehensive and realistic simulation of tumour genomic sequencing data O’Sullivan, Brian Seoighe, Cathal NAR Cancer Cancer Genomics Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive ‘ground truth’ data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples. Oxford University Press 2023-09-22 /pmc/articles/PMC10516706/ /pubmed/37746635 http://dx.doi.org/10.1093/narcan/zcad051 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Cancer. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Cancer Genomics
O’Sullivan, Brian
Seoighe, Cathal
Comprehensive and realistic simulation of tumour genomic sequencing data
title Comprehensive and realistic simulation of tumour genomic sequencing data
title_full Comprehensive and realistic simulation of tumour genomic sequencing data
title_fullStr Comprehensive and realistic simulation of tumour genomic sequencing data
title_full_unstemmed Comprehensive and realistic simulation of tumour genomic sequencing data
title_short Comprehensive and realistic simulation of tumour genomic sequencing data
title_sort comprehensive and realistic simulation of tumour genomic sequencing data
topic Cancer Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516706/
https://www.ncbi.nlm.nih.gov/pubmed/37746635
http://dx.doi.org/10.1093/narcan/zcad051
work_keys_str_mv AT osullivanbrian comprehensiveandrealisticsimulationoftumourgenomicsequencingdata
AT seoighecathal comprehensiveandrealisticsimulationoftumourgenomicsequencingdata