Cargando…

ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data

BACKGROUND: Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into c...

Descripción completa

Detalles Bibliográficos
Autores principales: Pischedda, Elisa, Crava, Cristina, Carlassara, Martina, Zucca, Susanna, Gasmi, Leila, Bonizzoni, Mariangela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7863434/
https://www.ncbi.nlm.nih.gov/pubmed/33541262
http://dx.doi.org/10.1186/s12859-021-03980-5
_version_ 1783647493534253056
author Pischedda, Elisa
Crava, Cristina
Carlassara, Martina
Zucca, Susanna
Gasmi, Leila
Bonizzoni, Mariangela
author_facet Pischedda, Elisa
Crava, Cristina
Carlassara, Martina
Zucca, Susanna
Gasmi, Leila
Bonizzoni, Mariangela
author_sort Pischedda, Elisa
collection PubMed
description BACKGROUND: Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. Recent genomics and metagenomics analyses have shown that viruses also integrate into the genome of non-model organisms (i.e., arthropods, fish, plants, vertebrates). However, rarely studies of endogenous viral elements (EVEs) in non-model organisms have gone beyond their characterization from reference genome assemblies. In non-model organisms, we lack a thorough understanding of the widespread occurrence of EVEs and their biological relevance, apart from sporadic cases which nevertheless point to significant roles of EVEs in immunity and regulation of expression. The concomitance of repetitive DNA, duplications and/or assembly fragmentations in a genome sequence and intrasample variability in whole-genome sequencing (WGS) data could determine misalignments when mapping data to a genome assembly. This phenomenon hinders our ability to properly identify integration sites. RESULTS: To fill this gap, we developed ViR, a pipeline which solves the dispersion of reads due to intrasample variability in sequencing data from both single and pooled DNA samples thus ameliorating the detection of integration sites. We tested ViR to work with both in silico and real sequencing data from a non-model organism, the arboviral vector Aedes albopictus. Potential viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results. CONCLUSION: ViR will open new venues to explore the biology of EVEs, especially in non-model organisms. Importantly, while we generated ViR with the identification of EVEs in mind, its application can be extended to detect any lateral transfer event providing an ad-hoc sequence to interrogate.
format Online
Article
Text
id pubmed-7863434
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78634342021-02-05 ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data Pischedda, Elisa Crava, Cristina Carlassara, Martina Zucca, Susanna Gasmi, Leila Bonizzoni, Mariangela BMC Bioinformatics Methodology Article BACKGROUND: Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. Recent genomics and metagenomics analyses have shown that viruses also integrate into the genome of non-model organisms (i.e., arthropods, fish, plants, vertebrates). However, rarely studies of endogenous viral elements (EVEs) in non-model organisms have gone beyond their characterization from reference genome assemblies. In non-model organisms, we lack a thorough understanding of the widespread occurrence of EVEs and their biological relevance, apart from sporadic cases which nevertheless point to significant roles of EVEs in immunity and regulation of expression. The concomitance of repetitive DNA, duplications and/or assembly fragmentations in a genome sequence and intrasample variability in whole-genome sequencing (WGS) data could determine misalignments when mapping data to a genome assembly. This phenomenon hinders our ability to properly identify integration sites. RESULTS: To fill this gap, we developed ViR, a pipeline which solves the dispersion of reads due to intrasample variability in sequencing data from both single and pooled DNA samples thus ameliorating the detection of integration sites. We tested ViR to work with both in silico and real sequencing data from a non-model organism, the arboviral vector Aedes albopictus. Potential viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results. CONCLUSION: ViR will open new venues to explore the biology of EVEs, especially in non-model organisms. Importantly, while we generated ViR with the identification of EVEs in mind, its application can be extended to detect any lateral transfer event providing an ad-hoc sequence to interrogate. BioMed Central 2021-02-04 /pmc/articles/PMC7863434/ /pubmed/33541262 http://dx.doi.org/10.1186/s12859-021-03980-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Pischedda, Elisa
Crava, Cristina
Carlassara, Martina
Zucca, Susanna
Gasmi, Leila
Bonizzoni, Mariangela
ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
title ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
title_full ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
title_fullStr ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
title_full_unstemmed ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
title_short ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
title_sort vir: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7863434/
https://www.ncbi.nlm.nih.gov/pubmed/33541262
http://dx.doi.org/10.1186/s12859-021-03980-5
work_keys_str_mv AT pischeddaelisa viratooltosolveintrasamplevariabilityinthepredictionofviralintegrationsitesusingwholegenomesequencingdata
AT cravacristina viratooltosolveintrasamplevariabilityinthepredictionofviralintegrationsitesusingwholegenomesequencingdata
AT carlassaramartina viratooltosolveintrasamplevariabilityinthepredictionofviralintegrationsitesusingwholegenomesequencingdata
AT zuccasusanna viratooltosolveintrasamplevariabilityinthepredictionofviralintegrationsitesusingwholegenomesequencingdata
AT gasmileila viratooltosolveintrasamplevariabilityinthepredictionofviralintegrationsitesusingwholegenomesequencingdata
AT bonizzonimariangela viratooltosolveintrasamplevariabilityinthepredictionofviralintegrationsitesusingwholegenomesequencingdata