Cargando…
The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variant...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551921/ https://www.ncbi.nlm.nih.gov/pubmed/26130710 http://dx.doi.org/10.1093/nar/gkv677 |
_version_ | 1782387646873468928 |
---|---|
author | Jiang, Yue Turinsky, Andrei L. Brudno, Michael |
author_facet | Jiang, Yue Turinsky, Andrei L. Brudno, Michael |
author_sort | Jiang, Yue |
collection | PubMed |
description | With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. |
format | Online Article Text |
id | pubmed-4551921 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-45519212015-08-28 The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection Jiang, Yue Turinsky, Andrei L. Brudno, Michael Nucleic Acids Res Computational Biology With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. Oxford University Press 2015-09-03 2015-06-30 /pmc/articles/PMC4551921/ /pubmed/26130710 http://dx.doi.org/10.1093/nar/gkv677 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Jiang, Yue Turinsky, Andrei L. Brudno, Michael The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
title | The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
title_full | The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
title_fullStr | The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
title_full_unstemmed | The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
title_short | The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
title_sort | missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551921/ https://www.ncbi.nlm.nih.gov/pubmed/26130710 http://dx.doi.org/10.1093/nar/gkv677 |
work_keys_str_mv | AT jiangyue themissingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection AT turinskyandreil themissingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection AT brudnomichael themissingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection AT jiangyue missingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection AT turinskyandreil missingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection AT brudnomichael missingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection |