Cargando…

Comparison of genotype clustering tools with rare variants

BACKGROUND: Along with the improvement of high throughput sequencing technologies, the genetics community is showing marked interest for the rare variants/common diseases hypothesis. While sequencing can still be prohibitive for large studies, commercially available genotyping arrays targeting rare...

Descripción completa

Detalles Bibliográficos
Autores principales: Perreault, Louis-Philippe Lemieux, Legault, Marc-André, Barhdadi, Amina, Provost, Sylvie, Normand, Valérie, Tardif, Jean-Claude, Dubé, Marie-Pierre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3941951/
https://www.ncbi.nlm.nih.gov/pubmed/24559245
http://dx.doi.org/10.1186/1471-2105-15-52
_version_ 1782306004469284864
author Perreault, Louis-Philippe Lemieux
Legault, Marc-André
Barhdadi, Amina
Provost, Sylvie
Normand, Valérie
Tardif, Jean-Claude
Dubé, Marie-Pierre
author_facet Perreault, Louis-Philippe Lemieux
Legault, Marc-André
Barhdadi, Amina
Provost, Sylvie
Normand, Valérie
Tardif, Jean-Claude
Dubé, Marie-Pierre
author_sort Perreault, Louis-Philippe Lemieux
collection PubMed
description BACKGROUND: Along with the improvement of high throughput sequencing technologies, the genetics community is showing marked interest for the rare variants/common diseases hypothesis. While sequencing can still be prohibitive for large studies, commercially available genotyping arrays targeting rare variants prove to be a reasonable alternative. A technical challenge of array based methods is the task of deriving genotype classes (homozygous or heterozygous) by clustering intensity data points. The performance of clustering tools for common polymorphisms is well established, while their performance when conducted with a large proportion of rare variants (where data points are sparse for genotypes containing the rare allele) is less known. We have compared the performance of four clustering tools (GenCall, GenoSNP, optiCall and zCall) for the genotyping of over 10,000 samples using the Illumina’s HumanExome BeadChip, which includes 247,870 variants, 90% of which have a minor allele frequency below 5% in a population of European ancestry. Different reference parameters for GenCall and different initial parameters for GenoSNP were tested. Genotyping accuracy was assessed using data from the 1000 Genomes Project as a gold standard, and agreement between tools was measured. RESULTS: Concordance of GenoSNP’s calls with the gold standard was below expectations and was increased by changing the tool’s initial parameters. While the four tools provided concordance with the gold standard above 99% for common alleles, some of them performed poorly for rare alleles. The reproducibility of genotype calls for each tool was assessed using experimental duplicates which provided concordance rates above 99%. The inter-tool agreement of genotype calls was high for approximately 95% of variants. Most tools yielded similar error rates (approximately 0.02), except for zCall which performed better with a 0.00164 mean error rate. CONCLUSIONS: The GenoSNP clustering tool could not be run straight “out of the box” with the HumanExome BeadChip, as modification of hard coded parameters was necessary to achieve optimal performance. Overall, GenCall marginally outperformed the other tools for the HumanExome BeadChip. The use of experimental replicates provided a valuable quality control tool for genotyping projects with rare variants.
format Online
Article
Text
id pubmed-3941951
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39419512014-03-14 Comparison of genotype clustering tools with rare variants Perreault, Louis-Philippe Lemieux Legault, Marc-André Barhdadi, Amina Provost, Sylvie Normand, Valérie Tardif, Jean-Claude Dubé, Marie-Pierre BMC Bioinformatics Methodology Article BACKGROUND: Along with the improvement of high throughput sequencing technologies, the genetics community is showing marked interest for the rare variants/common diseases hypothesis. While sequencing can still be prohibitive for large studies, commercially available genotyping arrays targeting rare variants prove to be a reasonable alternative. A technical challenge of array based methods is the task of deriving genotype classes (homozygous or heterozygous) by clustering intensity data points. The performance of clustering tools for common polymorphisms is well established, while their performance when conducted with a large proportion of rare variants (where data points are sparse for genotypes containing the rare allele) is less known. We have compared the performance of four clustering tools (GenCall, GenoSNP, optiCall and zCall) for the genotyping of over 10,000 samples using the Illumina’s HumanExome BeadChip, which includes 247,870 variants, 90% of which have a minor allele frequency below 5% in a population of European ancestry. Different reference parameters for GenCall and different initial parameters for GenoSNP were tested. Genotyping accuracy was assessed using data from the 1000 Genomes Project as a gold standard, and agreement between tools was measured. RESULTS: Concordance of GenoSNP’s calls with the gold standard was below expectations and was increased by changing the tool’s initial parameters. While the four tools provided concordance with the gold standard above 99% for common alleles, some of them performed poorly for rare alleles. The reproducibility of genotype calls for each tool was assessed using experimental duplicates which provided concordance rates above 99%. The inter-tool agreement of genotype calls was high for approximately 95% of variants. Most tools yielded similar error rates (approximately 0.02), except for zCall which performed better with a 0.00164 mean error rate. CONCLUSIONS: The GenoSNP clustering tool could not be run straight “out of the box” with the HumanExome BeadChip, as modification of hard coded parameters was necessary to achieve optimal performance. Overall, GenCall marginally outperformed the other tools for the HumanExome BeadChip. The use of experimental replicates provided a valuable quality control tool for genotyping projects with rare variants. BioMed Central 2014-02-21 /pmc/articles/PMC3941951/ /pubmed/24559245 http://dx.doi.org/10.1186/1471-2105-15-52 Text en Copyright © 2014 Lemieux Perreault et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Perreault, Louis-Philippe Lemieux
Legault, Marc-André
Barhdadi, Amina
Provost, Sylvie
Normand, Valérie
Tardif, Jean-Claude
Dubé, Marie-Pierre
Comparison of genotype clustering tools with rare variants
title Comparison of genotype clustering tools with rare variants
title_full Comparison of genotype clustering tools with rare variants
title_fullStr Comparison of genotype clustering tools with rare variants
title_full_unstemmed Comparison of genotype clustering tools with rare variants
title_short Comparison of genotype clustering tools with rare variants
title_sort comparison of genotype clustering tools with rare variants
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3941951/
https://www.ncbi.nlm.nih.gov/pubmed/24559245
http://dx.doi.org/10.1186/1471-2105-15-52
work_keys_str_mv AT perreaultlouisphilippelemieux comparisonofgenotypeclusteringtoolswithrarevariants
AT legaultmarcandre comparisonofgenotypeclusteringtoolswithrarevariants
AT barhdadiamina comparisonofgenotypeclusteringtoolswithrarevariants
AT provostsylvie comparisonofgenotypeclusteringtoolswithrarevariants
AT normandvalerie comparisonofgenotypeclusteringtoolswithrarevariants
AT tardifjeanclaude comparisonofgenotypeclusteringtoolswithrarevariants
AT dubemariepierre comparisonofgenotypeclusteringtoolswithrarevariants