Cargando…

Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets

Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Samadian, Soroush, Bruce, Jeff P., Pugh, Trevor J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5891060/
https://www.ncbi.nlm.nih.gov/pubmed/29590101
http://dx.doi.org/10.1371/journal.pcbi.1006080
_version_ 1783312958849286144
author Samadian, Soroush
Bruce, Jeff P.
Pugh, Trevor J.
author_facet Samadian, Soroush
Bruce, Jeff P.
Pugh, Trevor J.
author_sort Samadian, Soroush
collection PubMed
description Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20–100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer.
format Online
Article
Text
id pubmed-5891060
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58910602018-04-20 Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets Samadian, Soroush Bruce, Jeff P. Pugh, Trevor J. PLoS Comput Biol Research Article Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20–100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer. Public Library of Science 2018-03-28 /pmc/articles/PMC5891060/ /pubmed/29590101 http://dx.doi.org/10.1371/journal.pcbi.1006080 Text en © 2018 Samadian et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Samadian, Soroush
Bruce, Jeff P.
Pugh, Trevor J.
Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
title Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
title_full Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
title_fullStr Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
title_full_unstemmed Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
title_short Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
title_sort bamgineer: introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5891060/
https://www.ncbi.nlm.nih.gov/pubmed/29590101
http://dx.doi.org/10.1371/journal.pcbi.1006080
work_keys_str_mv AT samadiansoroush bamgineerintroductionofsimulatedallelespecificcopynumbervariantsintoexomeandtargetedsequencedatasets
AT brucejeffp bamgineerintroductionofsimulatedallelespecificcopynumbervariantsintoexomeandtargetedsequencedatasets
AT pughtrevorj bamgineerintroductionofsimulatedallelespecificcopynumbervariantsintoexomeandtargetedsequencedatasets