Cargando…
Comprehensive and realistic simulation of tumour genomic sequencing data
Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive ‘ground truth’ data, assessing the accuracy of these methods is challen...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516706/ https://www.ncbi.nlm.nih.gov/pubmed/37746635 http://dx.doi.org/10.1093/narcan/zcad051 |
_version_ | 1785109183680479232 |
---|---|
author | O’Sullivan, Brian Seoighe, Cathal |
author_facet | O’Sullivan, Brian Seoighe, Cathal |
author_sort | O’Sullivan, Brian |
collection | PubMed |
description | Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive ‘ground truth’ data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples. |
format | Online Article Text |
id | pubmed-10516706 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105167062023-09-23 Comprehensive and realistic simulation of tumour genomic sequencing data O’Sullivan, Brian Seoighe, Cathal NAR Cancer Cancer Genomics Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive ‘ground truth’ data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples. Oxford University Press 2023-09-22 /pmc/articles/PMC10516706/ /pubmed/37746635 http://dx.doi.org/10.1093/narcan/zcad051 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Cancer. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Cancer Genomics O’Sullivan, Brian Seoighe, Cathal Comprehensive and realistic simulation of tumour genomic sequencing data |
title | Comprehensive and realistic simulation of tumour genomic sequencing data |
title_full | Comprehensive and realistic simulation of tumour genomic sequencing data |
title_fullStr | Comprehensive and realistic simulation of tumour genomic sequencing data |
title_full_unstemmed | Comprehensive and realistic simulation of tumour genomic sequencing data |
title_short | Comprehensive and realistic simulation of tumour genomic sequencing data |
title_sort | comprehensive and realistic simulation of tumour genomic sequencing data |
topic | Cancer Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516706/ https://www.ncbi.nlm.nih.gov/pubmed/37746635 http://dx.doi.org/10.1093/narcan/zcad051 |
work_keys_str_mv | AT osullivanbrian comprehensiveandrealisticsimulationoftumourgenomicsequencingdata AT seoighecathal comprehensiveandrealisticsimulationoftumourgenomicsequencingdata |