Cargando…

Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance

BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients....

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Pankaj, Al-Shafai, Mashael, Al Muftah, Wadha Ahmed, Chalhoub, Nader, Elsaid, Mahmoud F, Aleem, Alice Abdel, Suhre, Karsten
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216909/
https://www.ncbi.nlm.nih.gov/pubmed/25339461
http://dx.doi.org/10.1186/1756-0500-7-747
_version_ 1782342326060843008
author Kumar, Pankaj
Al-Shafai, Mashael
Al Muftah, Wadha Ahmed
Chalhoub, Nader
Elsaid, Mahmoud F
Aleem, Alice Abdel
Suhre, Karsten
author_facet Kumar, Pankaj
Al-Shafai, Mashael
Al Muftah, Wadha Ahmed
Chalhoub, Nader
Elsaid, Mahmoud F
Aleem, Alice Abdel
Suhre, Karsten
author_sort Kumar, Pankaj
collection PubMed
description BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. RESULTS: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. CONCLUSION: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-747) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4216909
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42169092014-11-04 Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance Kumar, Pankaj Al-Shafai, Mashael Al Muftah, Wadha Ahmed Chalhoub, Nader Elsaid, Mahmoud F Aleem, Alice Abdel Suhre, Karsten BMC Res Notes Research Article BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. RESULTS: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. CONCLUSION: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-747) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-22 /pmc/articles/PMC4216909/ /pubmed/25339461 http://dx.doi.org/10.1186/1756-0500-7-747 Text en © Kumar et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Kumar, Pankaj
Al-Shafai, Mashael
Al Muftah, Wadha Ahmed
Chalhoub, Nader
Elsaid, Mahmoud F
Aleem, Alice Abdel
Suhre, Karsten
Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
title Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
title_full Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
title_fullStr Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
title_full_unstemmed Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
title_short Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
title_sort evaluation of snp calling using single and multiple-sample calling algorithms by validation against array base genotyping and mendelian inheritance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216909/
https://www.ncbi.nlm.nih.gov/pubmed/25339461
http://dx.doi.org/10.1186/1756-0500-7-747
work_keys_str_mv AT kumarpankaj evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance
AT alshafaimashael evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance
AT almuftahwadhaahmed evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance
AT chalhoubnader evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance
AT elsaidmahmoudf evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance
AT aleemaliceabdel evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance
AT suhrekarsten evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance