Cargando…

Vindel: a simple pipeline for checking indel redundancy

BACKGROUND: With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel v...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zhiyi, Wu, Xiaowei, He, Bin, Zhang, Liqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4245841/
https://www.ncbi.nlm.nih.gov/pubmed/25407965
http://dx.doi.org/10.1186/s12859-014-0359-1
_version_ 1782346434561966080
author Li, Zhiyi
Wu, Xiaowei
He, Bin
Zhang, Liqing
author_facet Li, Zhiyi
Wu, Xiaowei
He, Bin
Zhang, Liqing
author_sort Li, Zhiyi
collection PubMed
description BACKGROUND: With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors. RESULTS: In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants. CONCLUSIONS: Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0359-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4245841
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42458412014-11-28 Vindel: a simple pipeline for checking indel redundancy Li, Zhiyi Wu, Xiaowei He, Bin Zhang, Liqing BMC Bioinformatics Research Article BACKGROUND: With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors. RESULTS: In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants. CONCLUSIONS: Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0359-1) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-19 /pmc/articles/PMC4245841/ /pubmed/25407965 http://dx.doi.org/10.1186/s12859-014-0359-1 Text en © Li et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Li, Zhiyi
Wu, Xiaowei
He, Bin
Zhang, Liqing
Vindel: a simple pipeline for checking indel redundancy
title Vindel: a simple pipeline for checking indel redundancy
title_full Vindel: a simple pipeline for checking indel redundancy
title_fullStr Vindel: a simple pipeline for checking indel redundancy
title_full_unstemmed Vindel: a simple pipeline for checking indel redundancy
title_short Vindel: a simple pipeline for checking indel redundancy
title_sort vindel: a simple pipeline for checking indel redundancy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4245841/
https://www.ncbi.nlm.nih.gov/pubmed/25407965
http://dx.doi.org/10.1186/s12859-014-0359-1
work_keys_str_mv AT lizhiyi vindelasimplepipelineforcheckingindelredundancy
AT wuxiaowei vindelasimplepipelineforcheckingindelredundancy
AT hebin vindelasimplepipelineforcheckingindelredundancy
AT zhangliqing vindelasimplepipelineforcheckingindelredundancy