Cargando…
Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles
Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073887/ https://www.ncbi.nlm.nih.gov/pubmed/27766935 http://dx.doi.org/10.1186/s12859-016-1216-1 |
_version_ | 1782461650484330496 |
---|---|
author | Tran, Quang Gao, Shanshan Phan, Vinhthuy |
author_facet | Tran, Quang Gao, Shanshan Phan, Vinhthuy |
author_sort | Tran, Quang |
collection | PubMed |
description | Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner. This bias occurred at the level of aligning short reads to reference genomes to detect variants. The bias is caused by the existence of many theoretically optimal alignments between the reference genome and reads containing alternative alleles at those INDEL locations. We examined several popular aligners and showed that these aligners could be divided into groups whose alignments yielded INDELs that agreed strongly or disagreed strongly with reported INDELs. This finding suggests that the agreement or disagreement between the aligners’ called INDEL and the reported INDEL is merely a result of the arbitrary selection of one of the optimal alignments. The existence of bias in INDEL calling might have a serious influence in downstream analyses. As such, our finding suggests that this phenomenon should be further addressed. |
format | Online Article Text |
id | pubmed-5073887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50738872016-10-26 Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles Tran, Quang Gao, Shanshan Phan, Vinhthuy BMC Bioinformatics Proceedings Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner. This bias occurred at the level of aligning short reads to reference genomes to detect variants. The bias is caused by the existence of many theoretically optimal alignments between the reference genome and reads containing alternative alleles at those INDEL locations. We examined several popular aligners and showed that these aligners could be divided into groups whose alignments yielded INDELs that agreed strongly or disagreed strongly with reported INDELs. This finding suggests that the agreement or disagreement between the aligners’ called INDEL and the reported INDEL is merely a result of the arbitrary selection of one of the optimal alignments. The existence of bias in INDEL calling might have a serious influence in downstream analyses. As such, our finding suggests that this phenomenon should be further addressed. BioMed Central 2016-10-06 /pmc/articles/PMC5073887/ /pubmed/27766935 http://dx.doi.org/10.1186/s12859-016-1216-1 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Tran, Quang Gao, Shanshan Phan, Vinhthuy Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
title | Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
title_full | Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
title_fullStr | Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
title_full_unstemmed | Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
title_short | Analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
title_sort | analysis of optimal alignments unfolds aligners’ bias in existing variant profiles |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073887/ https://www.ncbi.nlm.nih.gov/pubmed/27766935 http://dx.doi.org/10.1186/s12859-016-1216-1 |
work_keys_str_mv | AT tranquang analysisofoptimalalignmentsunfoldsalignersbiasinexistingvariantprofiles AT gaoshanshan analysisofoptimalalignmentsunfoldsalignersbiasinexistingvariantprofiles AT phanvinhthuy analysisofoptimalalignmentsunfoldsalignersbiasinexistingvariantprofiles |