Cargando…
A bioinformatic pipeline for simulating viral integration data
Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046613/ https://www.ncbi.nlm.nih.gov/pubmed/35496474 http://dx.doi.org/10.1016/j.dib.2022.108161 |
_version_ | 1784695546983743488 |
---|---|
author | Scott, Suzanne Grigson, Susanna Hartkopf, Felix Hallwirth, Claus V. Alexander, Ian E. Bauer, Denis C. Wilson, Laurence O.W. |
author_facet | Scott, Suzanne Grigson, Susanna Hartkopf, Felix Hallwirth, Claus V. Alexander, Ian E. Bauer, Denis C. Wilson, Laurence O.W. |
author_sort | Scott, Suzanne |
collection | PubMed |
description | Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex characteristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the integrated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snakemake [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a reference containing integrations, from which sequencing reads are simulated using ART [5]. The IDs of the reads crossing integration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6]. |
format | Online Article Text |
id | pubmed-9046613 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-90466132022-04-29 A bioinformatic pipeline for simulating viral integration data Scott, Suzanne Grigson, Susanna Hartkopf, Felix Hallwirth, Claus V. Alexander, Ian E. Bauer, Denis C. Wilson, Laurence O.W. Data Brief Data Article Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for comparing with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex characteristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the integrated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snakemake [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a reference containing integrations, from which sequencing reads are simulated using ART [5]. The IDs of the reads crossing integration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6]. Elsevier 2022-04-10 /pmc/articles/PMC9046613/ /pubmed/35496474 http://dx.doi.org/10.1016/j.dib.2022.108161 Text en Crown Copyright © 2022 Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article Scott, Suzanne Grigson, Susanna Hartkopf, Felix Hallwirth, Claus V. Alexander, Ian E. Bauer, Denis C. Wilson, Laurence O.W. A bioinformatic pipeline for simulating viral integration data |
title | A bioinformatic pipeline for simulating viral integration data |
title_full | A bioinformatic pipeline for simulating viral integration data |
title_fullStr | A bioinformatic pipeline for simulating viral integration data |
title_full_unstemmed | A bioinformatic pipeline for simulating viral integration data |
title_short | A bioinformatic pipeline for simulating viral integration data |
title_sort | bioinformatic pipeline for simulating viral integration data |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046613/ https://www.ncbi.nlm.nih.gov/pubmed/35496474 http://dx.doi.org/10.1016/j.dib.2022.108161 |
work_keys_str_mv | AT scottsuzanne abioinformaticpipelineforsimulatingviralintegrationdata AT grigsonsusanna abioinformaticpipelineforsimulatingviralintegrationdata AT hartkopffelix abioinformaticpipelineforsimulatingviralintegrationdata AT hallwirthclausv abioinformaticpipelineforsimulatingviralintegrationdata AT alexanderiane abioinformaticpipelineforsimulatingviralintegrationdata AT bauerdenisc abioinformaticpipelineforsimulatingviralintegrationdata AT wilsonlaurenceow abioinformaticpipelineforsimulatingviralintegrationdata AT scottsuzanne bioinformaticpipelineforsimulatingviralintegrationdata AT grigsonsusanna bioinformaticpipelineforsimulatingviralintegrationdata AT hartkopffelix bioinformaticpipelineforsimulatingviralintegrationdata AT hallwirthclausv bioinformaticpipelineforsimulatingviralintegrationdata AT alexanderiane bioinformaticpipelineforsimulatingviralintegrationdata AT bauerdenisc bioinformaticpipelineforsimulatingviralintegrationdata AT wilsonlaurenceow bioinformaticpipelineforsimulatingviralintegrationdata |