Cargando…

SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data

Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence align...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Peng, Boisson, Bertrand, Stenson, Peter D, Cooper, David N, Casanova, Jean-Laurent, Abel, Laurent, Itan, Yuval
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602489/
https://www.ncbi.nlm.nih.gov/pubmed/31045209
http://dx.doi.org/10.1093/nar/gkz326
_version_ 1783431387930427392
author Zhang, Peng
Boisson, Bertrand
Stenson, Peter D
Cooper, David N
Casanova, Jean-Laurent
Abel, Laurent
Itan, Yuval
author_facet Zhang, Peng
Boisson, Bertrand
Stenson, Peter D
Cooper, David N
Casanova, Jean-Laurent
Abel, Laurent
Itan, Yuval
author_sort Zhang, Peng
collection PubMed
description Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is no existing webserver capable of extracting DNA/protein sequences for genomic variants from VCF files in a user-friendly and efficient manner. We developed the SeqTailor webserver to bridge this gap, by enabling rapid extraction of (i) DNA sequences around genomic variants, with customizable window sizes and options to annotate the splice sites closest to the variants and to consider the neighboring variants within the window; and (ii) protein sequences encoded by the DNA sequences around genomic variants, with built-in SnpEff annotator and customizable window sizes. SeqTailor supports 11 species, including: human (GRCh37/GRCh38), chimpanzee, mouse, rat, cow, chicken, lizard, zebrafish, fruitfly, Arabidopsis and rice. Standalone programs are provided for command-line-based needs. SeqTailor streamlines the sequence extraction process, and accelerates the analysis of genomic variants with software requiring DNA/protein sequences. It will facilitate the study of genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available at http://shiva.rockefeller.edu/SeqTailor/.
format Online
Article
Text
id pubmed-6602489
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66024892019-07-05 SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data Zhang, Peng Boisson, Bertrand Stenson, Peter D Cooper, David N Casanova, Jean-Laurent Abel, Laurent Itan, Yuval Nucleic Acids Res Web Server Issue Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is no existing webserver capable of extracting DNA/protein sequences for genomic variants from VCF files in a user-friendly and efficient manner. We developed the SeqTailor webserver to bridge this gap, by enabling rapid extraction of (i) DNA sequences around genomic variants, with customizable window sizes and options to annotate the splice sites closest to the variants and to consider the neighboring variants within the window; and (ii) protein sequences encoded by the DNA sequences around genomic variants, with built-in SnpEff annotator and customizable window sizes. SeqTailor supports 11 species, including: human (GRCh37/GRCh38), chimpanzee, mouse, rat, cow, chicken, lizard, zebrafish, fruitfly, Arabidopsis and rice. Standalone programs are provided for command-line-based needs. SeqTailor streamlines the sequence extraction process, and accelerates the analysis of genomic variants with software requiring DNA/protein sequences. It will facilitate the study of genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available at http://shiva.rockefeller.edu/SeqTailor/. Oxford University Press 2019-07-02 2019-05-02 /pmc/articles/PMC6602489/ /pubmed/31045209 http://dx.doi.org/10.1093/nar/gkz326 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Web Server Issue
Zhang, Peng
Boisson, Bertrand
Stenson, Peter D
Cooper, David N
Casanova, Jean-Laurent
Abel, Laurent
Itan, Yuval
SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data
title SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data
title_full SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data
title_fullStr SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data
title_full_unstemmed SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data
title_short SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data
title_sort seqtailor: a user-friendly webserver for the extraction of dna or protein sequences from next-generation sequencing data
topic Web Server Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602489/
https://www.ncbi.nlm.nih.gov/pubmed/31045209
http://dx.doi.org/10.1093/nar/gkz326
work_keys_str_mv AT zhangpeng seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata
AT boissonbertrand seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata
AT stensonpeterd seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata
AT cooperdavidn seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata
AT casanovajeanlaurent seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata
AT abellaurent seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata
AT itanyuval seqtailorauserfriendlywebserverfortheextractionofdnaorproteinsequencesfromnextgenerationsequencingdata