Cargando…

Comparison of insertion/deletion calling algorithms on human next-generation sequencing data

BACKGROUND: Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a resear...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghoneim, Dalia H, Myers, Jason R, Tuttle, Emily, Paciorkowski, Alex R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4265454/
https://www.ncbi.nlm.nih.gov/pubmed/25435282
http://dx.doi.org/10.1186/1756-0500-7-864
_version_ 1782348893424451584
author Ghoneim, Dalia H
Myers, Jason R
Tuttle, Emily
Paciorkowski, Alex R
author_facet Ghoneim, Dalia H
Myers, Jason R
Tuttle, Emily
Paciorkowski, Alex R
author_sort Ghoneim, Dalia H
collection PubMed
description BACKGROUND: Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). RESULTS: We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms. CONCLUSIONS: Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-864) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4265454
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42654542014-12-15 Comparison of insertion/deletion calling algorithms on human next-generation sequencing data Ghoneim, Dalia H Myers, Jason R Tuttle, Emily Paciorkowski, Alex R BMC Res Notes Research Article BACKGROUND: Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). RESULTS: We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms. CONCLUSIONS: Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-864) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-01 /pmc/articles/PMC4265454/ /pubmed/25435282 http://dx.doi.org/10.1186/1756-0500-7-864 Text en © Ghoneim et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ghoneim, Dalia H
Myers, Jason R
Tuttle, Emily
Paciorkowski, Alex R
Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
title Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
title_full Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
title_fullStr Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
title_full_unstemmed Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
title_short Comparison of insertion/deletion calling algorithms on human next-generation sequencing data
title_sort comparison of insertion/deletion calling algorithms on human next-generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4265454/
https://www.ncbi.nlm.nih.gov/pubmed/25435282
http://dx.doi.org/10.1186/1756-0500-7-864
work_keys_str_mv AT ghoneimdaliah comparisonofinsertiondeletioncallingalgorithmsonhumannextgenerationsequencingdata
AT myersjasonr comparisonofinsertiondeletioncallingalgorithmsonhumannextgenerationsequencingdata
AT tuttleemily comparisonofinsertiondeletioncallingalgorithmsonhumannextgenerationsequencingdata
AT paciorkowskialexr comparisonofinsertiondeletioncallingalgorithmsonhumannextgenerationsequencingdata