Cargando…
Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients....
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216909/ https://www.ncbi.nlm.nih.gov/pubmed/25339461 http://dx.doi.org/10.1186/1756-0500-7-747 |
_version_ | 1782342326060843008 |
---|---|
author | Kumar, Pankaj Al-Shafai, Mashael Al Muftah, Wadha Ahmed Chalhoub, Nader Elsaid, Mahmoud F Aleem, Alice Abdel Suhre, Karsten |
author_facet | Kumar, Pankaj Al-Shafai, Mashael Al Muftah, Wadha Ahmed Chalhoub, Nader Elsaid, Mahmoud F Aleem, Alice Abdel Suhre, Karsten |
author_sort | Kumar, Pankaj |
collection | PubMed |
description | BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. RESULTS: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. CONCLUSION: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-747) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4216909 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42169092014-11-04 Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance Kumar, Pankaj Al-Shafai, Mashael Al Muftah, Wadha Ahmed Chalhoub, Nader Elsaid, Mahmoud F Aleem, Alice Abdel Suhre, Karsten BMC Res Notes Research Article BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. RESULTS: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. CONCLUSION: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-747) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-22 /pmc/articles/PMC4216909/ /pubmed/25339461 http://dx.doi.org/10.1186/1756-0500-7-747 Text en © Kumar et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Kumar, Pankaj Al-Shafai, Mashael Al Muftah, Wadha Ahmed Chalhoub, Nader Elsaid, Mahmoud F Aleem, Alice Abdel Suhre, Karsten Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance |
title | Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance |
title_full | Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance |
title_fullStr | Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance |
title_full_unstemmed | Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance |
title_short | Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance |
title_sort | evaluation of snp calling using single and multiple-sample calling algorithms by validation against array base genotyping and mendelian inheritance |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4216909/ https://www.ncbi.nlm.nih.gov/pubmed/25339461 http://dx.doi.org/10.1186/1756-0500-7-747 |
work_keys_str_mv | AT kumarpankaj evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance AT alshafaimashael evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance AT almuftahwadhaahmed evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance AT chalhoubnader evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance AT elsaidmahmoudf evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance AT aleemaliceabdel evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance AT suhrekarsten evaluationofsnpcallingusingsingleandmultiplesamplecallingalgorithmsbyvalidationagainstarraybasegenotypingandmendelianinheritance |