Cargando…

SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines

BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for oth...

Descripción completa

Detalles Bibliográficos
Autores principales: Leung, Wai Yi, Marschall, Tobias, Paudel, Yogesh, Falquet, Laurent, Mei, Hailiang, Schönhuth, Alexander, Maoz (Moss), Tiffanie Yael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520269/
https://www.ncbi.nlm.nih.gov/pubmed/25887570
http://dx.doi.org/10.1186/s12864-015-1376-9
Descripción
Sumario:BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a. Creating an automated, standardized pipeline for SV prediction. b. Identifying the best tool(s) for SV prediction through benchmarking. c. Providing a statistically sound method for merging SV calls. RESULTS: The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION: SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1376-9) contains supplementary material, which is available to authorized users.