Cargando…

VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites

BACKGROUND: Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitutio...

Descripción completa

Detalles Bibliográficos
Autores principales: Spinozzi, Giulio, Calabria, Andrea, Brasca, Stefano, Beretta, Stefano, Merelli, Ivan, Milanesi, Luciano, Montini, Eugenio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5702242/
https://www.ncbi.nlm.nih.gov/pubmed/29178837
http://dx.doi.org/10.1186/s12859-017-1937-9
_version_ 1783281489364910080
author Spinozzi, Giulio
Calabria, Andrea
Brasca, Stefano
Beretta, Stefano
Merelli, Ivan
Milanesi, Luciano
Montini, Eugenio
author_facet Spinozzi, Giulio
Calabria, Andrea
Brasca, Stefano
Beretta, Stefano
Merelli, Ivan
Milanesi, Luciano
Montini, Eugenio
author_sort Spinozzi, Giulio
collection PubMed
description BACKGROUND: Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process “big data” in a reasonable computational time. RESULTS: Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). CONCLUSIONS: We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 (http://openserver.itb.cnr.it/vispa/) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository (https://bitbucket.org/andreacalabria/vispa2). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1937-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5702242
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57022422017-12-04 VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites Spinozzi, Giulio Calabria, Andrea Brasca, Stefano Beretta, Stefano Merelli, Ivan Milanesi, Luciano Montini, Eugenio BMC Bioinformatics Software BACKGROUND: Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process “big data” in a reasonable computational time. RESULTS: Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). CONCLUSIONS: We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 (http://openserver.itb.cnr.it/vispa/) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository (https://bitbucket.org/andreacalabria/vispa2). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1937-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-11-25 /pmc/articles/PMC5702242/ /pubmed/29178837 http://dx.doi.org/10.1186/s12859-017-1937-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Spinozzi, Giulio
Calabria, Andrea
Brasca, Stefano
Beretta, Stefano
Merelli, Ivan
Milanesi, Luciano
Montini, Eugenio
VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_full VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_fullStr VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_full_unstemmed VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_short VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
title_sort vispa2: a scalable pipeline for high-throughput identification and annotation of vector integration sites
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5702242/
https://www.ncbi.nlm.nih.gov/pubmed/29178837
http://dx.doi.org/10.1186/s12859-017-1937-9
work_keys_str_mv AT spinozzigiulio vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT calabriaandrea vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT brascastefano vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT berettastefano vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT merelliivan vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT milanesiluciano vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites
AT montinieugenio vispa2ascalablepipelineforhighthroughputidentificationandannotationofvectorintegrationsites