Cargando…

A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer

Somatic mutations promote the transformation of normal cells to cancer. Accurate identification of such mutations facilitates cancer diagnosis and treatment, but biological and technological noises, including intra-tumor heterogeneity, sample contamination, uncertainties in base sequencing and read...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Jing, Chen, Yi-Ping Phoebe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116990/
https://www.ncbi.nlm.nih.gov/pubmed/30161165
http://dx.doi.org/10.1371/journal.pone.0202982
_version_ 1783351679752601600
author Meng, Jing
Chen, Yi-Ping Phoebe
author_facet Meng, Jing
Chen, Yi-Ping Phoebe
author_sort Meng, Jing
collection PubMed
description Somatic mutations promote the transformation of normal cells to cancer. Accurate identification of such mutations facilitates cancer diagnosis and treatment, but biological and technological noises, including intra-tumor heterogeneity, sample contamination, uncertainties in base sequencing and read alignment, pose a big challenge to somatic mutation discovery. A number of callers have been developed to predict them from paired tumor/normal or unpaired tumor sequencing data. However, the small size of currently available experimentally validated somatic sites limits evaluation and then improvement of callers. Fortunately, NIST reference material NA12878 genome has been well-characterized with publicly available high-confidence genotype calls, and biological and technological noises can be computationally generalized to the number of sub-clones, the VAFs, the sequencing and mapping qualities. We used BAMSurgeon to create simulated tumors by introducing somatic small variants (SNVs and small indels) into homozygous reference or wildtype sites of NA12878. We generated 135 simulated tumors from 5 pre-tumors/normals. These simulated tumors vary in sequencing and subsequent mapping error profiles, read length, the number of sub-clones, the VAF, the mutation frequency across the genome and the genomic context. Furthermore, these pure tumor/normal pairs can be mixed at desired ratios within each pair to simulate sample contamination. This database (a total size of 15 terabytes) will be of great use to benchmark somatic small variant callers and guide their improvement.
format Online
Article
Text
id pubmed-6116990
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61169902018-09-16 A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer Meng, Jing Chen, Yi-Ping Phoebe PLoS One Research Article Somatic mutations promote the transformation of normal cells to cancer. Accurate identification of such mutations facilitates cancer diagnosis and treatment, but biological and technological noises, including intra-tumor heterogeneity, sample contamination, uncertainties in base sequencing and read alignment, pose a big challenge to somatic mutation discovery. A number of callers have been developed to predict them from paired tumor/normal or unpaired tumor sequencing data. However, the small size of currently available experimentally validated somatic sites limits evaluation and then improvement of callers. Fortunately, NIST reference material NA12878 genome has been well-characterized with publicly available high-confidence genotype calls, and biological and technological noises can be computationally generalized to the number of sub-clones, the VAFs, the sequencing and mapping qualities. We used BAMSurgeon to create simulated tumors by introducing somatic small variants (SNVs and small indels) into homozygous reference or wildtype sites of NA12878. We generated 135 simulated tumors from 5 pre-tumors/normals. These simulated tumors vary in sequencing and subsequent mapping error profiles, read length, the number of sub-clones, the VAF, the mutation frequency across the genome and the genomic context. Furthermore, these pure tumor/normal pairs can be mixed at desired ratios within each pair to simulate sample contamination. This database (a total size of 15 terabytes) will be of great use to benchmark somatic small variant callers and guide their improvement. Public Library of Science 2018-08-30 /pmc/articles/PMC6116990/ /pubmed/30161165 http://dx.doi.org/10.1371/journal.pone.0202982 Text en © 2018 Meng, Chen http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Meng, Jing
Chen, Yi-Ping Phoebe
A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
title A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
title_full A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
title_fullStr A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
title_full_unstemmed A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
title_short A database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
title_sort database of simulated tumor genomes towards accurate detection of somatic small variants in cancer
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116990/
https://www.ncbi.nlm.nih.gov/pubmed/30161165
http://dx.doi.org/10.1371/journal.pone.0202982
work_keys_str_mv AT mengjing adatabaseofsimulatedtumorgenomestowardsaccuratedetectionofsomaticsmallvariantsincancer
AT chenyipingphoebe adatabaseofsimulatedtumorgenomestowardsaccuratedetectionofsomaticsmallvariantsincancer
AT mengjing databaseofsimulatedtumorgenomestowardsaccuratedetectionofsomaticsmallvariantsincancer
AT chenyipingphoebe databaseofsimulatedtumorgenomestowardsaccuratedetectionofsomaticsmallvariantsincancer