Cargando…
WEP: a high-performance analysis pipeline for whole-exome data
BACKGROUND: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics. In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3633005/ https://www.ncbi.nlm.nih.gov/pubmed/23815231 http://dx.doi.org/10.1186/1471-2105-14-S7-S11 |
_version_ | 1782266927101509632 |
---|---|
author | D'Antonio, Mattia D'Onorio De Meo, Paolo Paoletti, Daniele Elmi, Berardino Pallocca, Matteo Sanna, Nico Picardi, Ernesto Pesole, Graziano Castrignanò, Tiziana |
author_facet | D'Antonio, Mattia D'Onorio De Meo, Paolo Paoletti, Daniele Elmi, Berardino Pallocca, Matteo Sanna, Nico Picardi, Ernesto Pesole, Graziano Castrignanò, Tiziana |
author_sort | D'Antonio, Mattia |
collection | PubMed |
description | BACKGROUND: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics. In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline. Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. RESULTS: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps: 1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. CONCLUSIONS: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization. Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives. The web tool is available at the following web address: http://www.caspur.it/wep |
format | Online Article Text |
id | pubmed-3633005 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36330052013-04-25 WEP: a high-performance analysis pipeline for whole-exome data D'Antonio, Mattia D'Onorio De Meo, Paolo Paoletti, Daniele Elmi, Berardino Pallocca, Matteo Sanna, Nico Picardi, Ernesto Pesole, Graziano Castrignanò, Tiziana BMC Bioinformatics Research BACKGROUND: The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics. In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline. Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. RESULTS: Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps: 1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. CONCLUSIONS: Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization. Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives. The web tool is available at the following web address: http://www.caspur.it/wep BioMed Central 2013-04-22 /pmc/articles/PMC3633005/ /pubmed/23815231 http://dx.doi.org/10.1186/1471-2105-14-S7-S11 Text en Copyright © 2013 D'Antonio et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research D'Antonio, Mattia D'Onorio De Meo, Paolo Paoletti, Daniele Elmi, Berardino Pallocca, Matteo Sanna, Nico Picardi, Ernesto Pesole, Graziano Castrignanò, Tiziana WEP: a high-performance analysis pipeline for whole-exome data |
title | WEP: a high-performance analysis pipeline for whole-exome data |
title_full | WEP: a high-performance analysis pipeline for whole-exome data |
title_fullStr | WEP: a high-performance analysis pipeline for whole-exome data |
title_full_unstemmed | WEP: a high-performance analysis pipeline for whole-exome data |
title_short | WEP: a high-performance analysis pipeline for whole-exome data |
title_sort | wep: a high-performance analysis pipeline for whole-exome data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3633005/ https://www.ncbi.nlm.nih.gov/pubmed/23815231 http://dx.doi.org/10.1186/1471-2105-14-S7-S11 |
work_keys_str_mv | AT dantoniomattia wepahighperformanceanalysispipelineforwholeexomedata AT donoriodemeopaolo wepahighperformanceanalysispipelineforwholeexomedata AT paolettidaniele wepahighperformanceanalysispipelineforwholeexomedata AT elmiberardino wepahighperformanceanalysispipelineforwholeexomedata AT palloccamatteo wepahighperformanceanalysispipelineforwholeexomedata AT sannanico wepahighperformanceanalysispipelineforwholeexomedata AT picardiernesto wepahighperformanceanalysispipelineforwholeexomedata AT pesolegraziano wepahighperformanceanalysispipelineforwholeexomedata AT castrignanotiziana wepahighperformanceanalysispipelineforwholeexomedata |