Cargando…

SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution

BACKGROUND: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Li Charlie, Ai, Dongmei, Lee, Hojoon, Andor, Noemi, Li, Chao, Zhang, Nancy R, Ji, Hanlee P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6057526/
https://www.ncbi.nlm.nih.gov/pubmed/29982625
http://dx.doi.org/10.1093/gigascience/giy081
_version_ 1783341541668945920
author Xia, Li Charlie
Ai, Dongmei
Lee, Hojoon
Andor, Noemi
Li, Chao
Zhang, Nancy R
Ji, Hanlee P
author_facet Xia, Li Charlie
Ai, Dongmei
Lee, Hojoon
Andor, Noemi
Li, Chao
Zhang, Nancy R
Ji, Hanlee P
author_sort Xia, Li Charlie
collection PubMed
description BACKGROUND: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes. FINDINGS: We developed SVEngine, an open-source tool to address this need. SVEngine simulates next-generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file, and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs), and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions, and translocations. Finally, SVEngine simulates sequence data that replicate the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. CONCLUSIONS: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogeneous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift, and neighboring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use .
format Online
Article
Text
id pubmed-6057526
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60575262018-07-27 SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution Xia, Li Charlie Ai, Dongmei Lee, Hojoon Andor, Noemi Li, Chao Zhang, Nancy R Ji, Hanlee P Gigascience Technical Note BACKGROUND: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes. FINDINGS: We developed SVEngine, an open-source tool to address this need. SVEngine simulates next-generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file, and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs), and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions, and translocations. Finally, SVEngine simulates sequence data that replicate the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. CONCLUSIONS: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogeneous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift, and neighboring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use . Oxford University Press 2018-07-05 /pmc/articles/PMC6057526/ /pubmed/29982625 http://dx.doi.org/10.1093/gigascience/giy081 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Xia, Li Charlie
Ai, Dongmei
Lee, Hojoon
Andor, Noemi
Li, Chao
Zhang, Nancy R
Ji, Hanlee P
SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
title SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
title_full SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
title_fullStr SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
title_full_unstemmed SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
title_short SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
title_sort svengine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6057526/
https://www.ncbi.nlm.nih.gov/pubmed/29982625
http://dx.doi.org/10.1093/gigascience/giy081
work_keys_str_mv AT xialicharlie svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution
AT aidongmei svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution
AT leehojoon svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution
AT andornoemi svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution
AT lichao svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution
AT zhangnancyr svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution
AT jihanleep svengineanefficientandversatilesimulatorofgenomestructuralvariationswithfeaturesofcancerclonalevolution