Cargando…

SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines

BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for oth...

Descripción completa

Detalles Bibliográficos
Autores principales: Leung, Wai Yi, Marschall, Tobias, Paudel, Yogesh, Falquet, Laurent, Mei, Hailiang, Schönhuth, Alexander, Maoz (Moss), Tiffanie Yael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520269/
https://www.ncbi.nlm.nih.gov/pubmed/25887570
http://dx.doi.org/10.1186/s12864-015-1376-9
_version_ 1782383638291152896
author Leung, Wai Yi
Marschall, Tobias
Paudel, Yogesh
Falquet, Laurent
Mei, Hailiang
Schönhuth, Alexander
Maoz (Moss), Tiffanie Yael
author_facet Leung, Wai Yi
Marschall, Tobias
Paudel, Yogesh
Falquet, Laurent
Mei, Hailiang
Schönhuth, Alexander
Maoz (Moss), Tiffanie Yael
author_sort Leung, Wai Yi
collection PubMed
description BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a. Creating an automated, standardized pipeline for SV prediction. b. Identifying the best tool(s) for SV prediction through benchmarking. c. Providing a statistically sound method for merging SV calls. RESULTS: The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION: SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1376-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4520269
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45202692015-07-31 SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines Leung, Wai Yi Marschall, Tobias Paudel, Yogesh Falquet, Laurent Mei, Hailiang Schönhuth, Alexander Maoz (Moss), Tiffanie Yael BMC Genomics Research Article BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a. Creating an automated, standardized pipeline for SV prediction. b. Identifying the best tool(s) for SV prediction through benchmarking. c. Providing a statistically sound method for merging SV calls. RESULTS: The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION: SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1376-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-25 /pmc/articles/PMC4520269/ /pubmed/25887570 http://dx.doi.org/10.1186/s12864-015-1376-9 Text en © Leung et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Leung, Wai Yi
Marschall, Tobias
Paudel, Yogesh
Falquet, Laurent
Mei, Hailiang
Schönhuth, Alexander
Maoz (Moss), Tiffanie Yael
SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
title SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
title_full SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
title_fullStr SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
title_full_unstemmed SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
title_short SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
title_sort sv-autopilot: optimized, automated construction of structural variation discovery and benchmarking pipelines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520269/
https://www.ncbi.nlm.nih.gov/pubmed/25887570
http://dx.doi.org/10.1186/s12864-015-1376-9
work_keys_str_mv AT leungwaiyi svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines
AT marschalltobias svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines
AT paudelyogesh svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines
AT falquetlaurent svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines
AT meihailiang svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines
AT schonhuthalexander svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines
AT maozmosstiffanieyael svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines