Cargando…

VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data

DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Gelfand, Yevgeniy, Hernandez, Yozen, Loving, Joshua, Benson, Gary
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4132751/
https://www.ncbi.nlm.nih.gov/pubmed/25056320
http://dx.doi.org/10.1093/nar/gku642
_version_ 1782330673559764992
author Gelfand, Yevgeniy
Hernandez, Yozen
Loving, Joshua
Benson, Gary
author_facet Gelfand, Yevgeniy
Hernandez, Yozen
Loving, Joshua
Benson, Gary
author_sort Gelfand, Yevgeniy
collection PubMed
description DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.
format Online
Article
Text
id pubmed-4132751
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41327512014-12-01 VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data Gelfand, Yevgeniy Hernandez, Yozen Loving, Joshua Benson, Gary Nucleic Acids Res Computational Biology DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/. Oxford University Press 2014-08-18 2014-07-23 /pmc/articles/PMC4132751/ /pubmed/25056320 http://dx.doi.org/10.1093/nar/gku642 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Gelfand, Yevgeniy
Hernandez, Yozen
Loving, Joshua
Benson, Gary
VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
title VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
title_full VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
title_fullStr VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
title_full_unstemmed VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
title_short VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
title_sort vntrseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4132751/
https://www.ncbi.nlm.nih.gov/pubmed/25056320
http://dx.doi.org/10.1093/nar/gku642
work_keys_str_mv AT gelfandyevgeniy vntrseekacomputationaltooltodetecttandemrepeatvariantsinhighthroughputsequencingdata
AT hernandezyozen vntrseekacomputationaltooltodetecttandemrepeatvariantsinhighthroughputsequencingdata
AT lovingjoshua vntrseekacomputationaltooltodetecttandemrepeatvariantsinhighthroughputsequencingdata
AT bensongary vntrseekacomputationaltooltodetecttandemrepeatvariantsinhighthroughputsequencingdata