Cargando…

The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variant...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yue, Turinsky, Andrei L., Brudno, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551921/
https://www.ncbi.nlm.nih.gov/pubmed/26130710
http://dx.doi.org/10.1093/nar/gkv677
_version_ 1782387646873468928
author Jiang, Yue
Turinsky, Andrei L.
Brudno, Michael
author_facet Jiang, Yue
Turinsky, Andrei L.
Brudno, Michael
author_sort Jiang, Yue
collection PubMed
description With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them.
format Online
Article
Text
id pubmed-4551921
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45519212015-08-28 The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection Jiang, Yue Turinsky, Andrei L. Brudno, Michael Nucleic Acids Res Computational Biology With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. Oxford University Press 2015-09-03 2015-06-30 /pmc/articles/PMC4551921/ /pubmed/26130710 http://dx.doi.org/10.1093/nar/gkv677 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Jiang, Yue
Turinsky, Andrei L.
Brudno, Michael
The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
title The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
title_full The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
title_fullStr The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
title_full_unstemmed The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
title_short The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
title_sort missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551921/
https://www.ncbi.nlm.nih.gov/pubmed/26130710
http://dx.doi.org/10.1093/nar/gkv677
work_keys_str_mv AT jiangyue themissingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection
AT turinskyandreil themissingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection
AT brudnomichael themissingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection
AT jiangyue missingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection
AT turinskyandreil missingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection
AT brudnomichael missingindelsanestimateofindelvariationinahumangenomeandanalysisoffactorsthatimpededetection