Cargando…
SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines
BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for oth...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520269/ https://www.ncbi.nlm.nih.gov/pubmed/25887570 http://dx.doi.org/10.1186/s12864-015-1376-9 |
_version_ | 1782383638291152896 |
---|---|
author | Leung, Wai Yi Marschall, Tobias Paudel, Yogesh Falquet, Laurent Mei, Hailiang Schönhuth, Alexander Maoz (Moss), Tiffanie Yael |
author_facet | Leung, Wai Yi Marschall, Tobias Paudel, Yogesh Falquet, Laurent Mei, Hailiang Schönhuth, Alexander Maoz (Moss), Tiffanie Yael |
author_sort | Leung, Wai Yi |
collection | PubMed |
description | BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a. Creating an automated, standardized pipeline for SV prediction. b. Identifying the best tool(s) for SV prediction through benchmarking. c. Providing a statistically sound method for merging SV calls. RESULTS: The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION: SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1376-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4520269 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45202692015-07-31 SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines Leung, Wai Yi Marschall, Tobias Paudel, Yogesh Falquet, Laurent Mei, Hailiang Schönhuth, Alexander Maoz (Moss), Tiffanie Yael BMC Genomics Research Article BACKGROUND: Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a. Creating an automated, standardized pipeline for SV prediction. b. Identifying the best tool(s) for SV prediction through benchmarking. c. Providing a statistically sound method for merging SV calls. RESULTS: The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION: SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1376-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-03-25 /pmc/articles/PMC4520269/ /pubmed/25887570 http://dx.doi.org/10.1186/s12864-015-1376-9 Text en © Leung et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Leung, Wai Yi Marschall, Tobias Paudel, Yogesh Falquet, Laurent Mei, Hailiang Schönhuth, Alexander Maoz (Moss), Tiffanie Yael SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines |
title | SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines |
title_full | SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines |
title_fullStr | SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines |
title_full_unstemmed | SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines |
title_short | SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines |
title_sort | sv-autopilot: optimized, automated construction of structural variation discovery and benchmarking pipelines |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520269/ https://www.ncbi.nlm.nih.gov/pubmed/25887570 http://dx.doi.org/10.1186/s12864-015-1376-9 |
work_keys_str_mv | AT leungwaiyi svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines AT marschalltobias svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines AT paudelyogesh svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines AT falquetlaurent svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines AT meihailiang svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines AT schonhuthalexander svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines AT maozmosstiffanieyael svautopilotoptimizedautomatedconstructionofstructuralvariationdiscoveryandbenchmarkingpipelines |