Cargando…

A bioinformatic pipeline for simulating viral integration data

Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline...

Descripción completa

Detalles Bibliográficos
Autores principales: Scott, Suzanne, Grigson, Susanna, Hartkopf, Felix, Hallwirth, Claus V., Alexander, Ian E., Bauer, Denis C., Wilson, Laurence O.W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046613/
https://www.ncbi.nlm.nih.gov/pubmed/35496474
http://dx.doi.org/10.1016/j.dib.2022.108161
_version_ 1784695546983743488
author Scott, Suzanne
Grigson, Susanna
Hartkopf, Felix
Hallwirth, Claus V.
Alexander, Ian E.
Bauer, Denis C.
Wilson, Laurence O.W.
author_facet Scott, Suzanne
Grigson, Susanna
Hartkopf, Felix
Hallwirth, Claus V.
Alexander, Ian E.
Bauer, Denis C.
Wilson, Laurence O.W.
author_sort Scott, Suzanne
collection PubMed
description Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex characteristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the integrated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snakemake [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a reference containing integrations, from which sequencing reads are simulated using ART [5]. The IDs of the reads crossing integration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6].
format Online
Article
Text
id pubmed-9046613
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-90466132022-04-29 A bioinformatic pipeline for simulating viral integration data Scott, Suzanne Grigson, Susanna Hartkopf, Felix Hallwirth, Claus V. Alexander, Ian E. Bauer, Denis C. Wilson, Laurence O.W. Data Brief Data Article Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex characteristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the integrated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snakemake [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a reference containing integrations, from which sequencing reads are simulated using ART [5]. The IDs of the reads crossing integration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6]. Elsevier 2022-04-10 /pmc/articles/PMC9046613/ /pubmed/35496474 http://dx.doi.org/10.1016/j.dib.2022.108161 Text en Crown Copyright © 2022 Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Scott, Suzanne
Grigson, Susanna
Hartkopf, Felix
Hallwirth, Claus V.
Alexander, Ian E.
Bauer, Denis C.
Wilson, Laurence O.W.
A bioinformatic pipeline for simulating viral integration data
title A bioinformatic pipeline for simulating viral integration data
title_full A bioinformatic pipeline for simulating viral integration data
title_fullStr A bioinformatic pipeline for simulating viral integration data
title_full_unstemmed A bioinformatic pipeline for simulating viral integration data
title_short A bioinformatic pipeline for simulating viral integration data
title_sort bioinformatic pipeline for simulating viral integration data
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046613/
https://www.ncbi.nlm.nih.gov/pubmed/35496474
http://dx.doi.org/10.1016/j.dib.2022.108161
work_keys_str_mv AT scottsuzanne abioinformaticpipelineforsimulatingviralintegrationdata
AT grigsonsusanna abioinformaticpipelineforsimulatingviralintegrationdata
AT hartkopffelix abioinformaticpipelineforsimulatingviralintegrationdata
AT hallwirthclausv abioinformaticpipelineforsimulatingviralintegrationdata
AT alexanderiane abioinformaticpipelineforsimulatingviralintegrationdata
AT bauerdenisc abioinformaticpipelineforsimulatingviralintegrationdata
AT wilsonlaurenceow abioinformaticpipelineforsimulatingviralintegrationdata
AT scottsuzanne bioinformaticpipelineforsimulatingviralintegrationdata
AT grigsonsusanna bioinformaticpipelineforsimulatingviralintegrationdata
AT hartkopffelix bioinformaticpipelineforsimulatingviralintegrationdata
AT hallwirthclausv bioinformaticpipelineforsimulatingviralintegrationdata
AT alexanderiane bioinformaticpipelineforsimulatingviralintegrationdata
AT bauerdenisc bioinformaticpipelineforsimulatingviralintegrationdata
AT wilsonlaurenceow bioinformaticpipelineforsimulatingviralintegrationdata