Cargando…

Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles

Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Quang, Gao, Shanshan, Phan, Vinhthuy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073887/
https://www.ncbi.nlm.nih.gov/pubmed/27766935
http://dx.doi.org/10.1186/s12859-016-1216-1
_version_ 1782461650484330496
author Tran, Quang
Gao, Shanshan
Phan, Vinhthuy
author_facet Tran, Quang
Gao, Shanshan
Phan, Vinhthuy
author_sort Tran, Quang
collection PubMed
description Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner. This bias occurred at the level of aligning short reads to reference genomes to detect variants. The bias is caused by the existence of many theoretically optimal alignments between the reference genome and reads containing alternative alleles at those INDEL locations. We examined several popular aligners and showed that these aligners could be divided into groups whose alignments yielded INDELs that agreed strongly or disagreed strongly with reported INDELs. This finding suggests that the agreement or disagreement between the aligners’ called INDEL and the reported INDEL is merely a result of the arbitrary selection of one of the optimal alignments. The existence of bias in INDEL calling might have a serious influence in downstream analyses. As such, our finding suggests that this phenomenon should be further addressed.
format Online
Article
Text
id pubmed-5073887
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50738872016-10-26 Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles Tran, Quang Gao, Shanshan Phan, Vinhthuy BMC Bioinformatics Proceedings Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner. This bias occurred at the level of aligning short reads to reference genomes to detect variants. The bias is caused by the existence of many theoretically optimal alignments between the reference genome and reads containing alternative alleles at those INDEL locations. We examined several popular aligners and showed that these aligners could be divided into groups whose alignments yielded INDELs that agreed strongly or disagreed strongly with reported INDELs. This finding suggests that the agreement or disagreement between the aligners’ called INDEL and the reported INDEL is merely a result of the arbitrary selection of one of the optimal alignments. The existence of bias in INDEL calling might have a serious influence in downstream analyses. As such, our finding suggests that this phenomenon should be further addressed. BioMed Central 2016-10-06 /pmc/articles/PMC5073887/ /pubmed/27766935 http://dx.doi.org/10.1186/s12859-016-1216-1 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Tran, Quang
Gao, Shanshan
Phan, Vinhthuy
Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
title Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
title_full Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
title_fullStr Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
title_full_unstemmed Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
title_short Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
title_sort analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073887/
https://www.ncbi.nlm.nih.gov/pubmed/27766935
http://dx.doi.org/10.1186/s12859-016-1216-1
work_keys_str_mv AT tranquang analysisofoptimalalignmentsunfoldsalignersbiasinexistingvariantprofiles
AT gaoshanshan analysisofoptimalalignmentsunfoldsalignersbiasinexistingvariantprofiles
AT phanvinhthuy analysisofoptimalalignmentsunfoldsalignersbiasinexistingvariantprofiles