Cargando…
Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets
Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5891060/ https://www.ncbi.nlm.nih.gov/pubmed/29590101 http://dx.doi.org/10.1371/journal.pcbi.1006080 |
_version_ | 1783312958849286144 |
---|---|
author | Samadian, Soroush Bruce, Jeff P. Pugh, Trevor J. |
author_facet | Samadian, Soroush Bruce, Jeff P. Pugh, Trevor J. |
author_sort | Samadian, Soroush |
collection | PubMed |
description | Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20–100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer. |
format | Online Article Text |
id | pubmed-5891060 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-58910602018-04-20 Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets Samadian, Soroush Bruce, Jeff P. Pugh, Trevor J. PLoS Comput Biol Research Article Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20–100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer. Public Library of Science 2018-03-28 /pmc/articles/PMC5891060/ /pubmed/29590101 http://dx.doi.org/10.1371/journal.pcbi.1006080 Text en © 2018 Samadian et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Samadian, Soroush Bruce, Jeff P. Pugh, Trevor J. Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
title | Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
title_full | Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
title_fullStr | Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
title_full_unstemmed | Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
title_short | Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
title_sort | bamgineer: introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5891060/ https://www.ncbi.nlm.nih.gov/pubmed/29590101 http://dx.doi.org/10.1371/journal.pcbi.1006080 |
work_keys_str_mv | AT samadiansoroush bamgineerintroductionofsimulatedallelespecificcopynumbervariantsintoexomeandtargetedsequencedatasets AT brucejeffp bamgineerintroductionofsimulatedallelespecificcopynumbervariantsintoexomeandtargetedsequencedatasets AT pughtrevorj bamgineerintroductionofsimulatedallelespecificcopynumbervariantsintoexomeandtargetedsequencedatasets |