Cargando…

Genotype calling in tetraploid species from bi-allelic marker data using mixture models

BACKGROUND: Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either...

Descripción completa

Detalles Bibliográficos
Autores principales: Voorrips, Roeland E, Gort, Gerrit, Vosman, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121645/
https://www.ncbi.nlm.nih.gov/pubmed/21595880
http://dx.doi.org/10.1186/1471-2105-12-172
_version_ 1782206842784448512
author Voorrips, Roeland E
Gort, Gerrit
Vosman, Ben
author_facet Voorrips, Roeland E
Gort, Gerrit
Vosman, Ben
author_sort Voorrips, Roeland E
collection PubMed
description BACKGROUND: Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either allele, or heterozygous) for diploid species from these signals, such software is not available for tetraploid species which may be scored as five alternative genotypes (aaaa, baaa, bbaa, bbba and bbbb; nulliplex to quadruplex). RESULTS: We present a novel algorithm, implemented in the R package fitTetra, to assign genotypes for bi-allelic markers to tetraploid samples from genotyping assays that produce intensity signals for both alleles. The algorithm is based on the fitting of several mixture models with five components, one for each of the five possible genotypes. The models have different numbers of parameters specifying the relation between the five component means, and some of them impose a constraint on the mixing proportions to conform to Hardy-Weinberg equilibrium (HWE) ratios. The software rejects markers that do not allow a reliable genotyping for the majority of the samples, and it assigns a missing score to samples that cannot be scored into one of the five possible genotypes with sufficient confidence. CONCLUSIONS: We have validated the software with data of a collection of 224 potato varieties assayed with an Illumina GoldenGate™ 384 SNP array and shown that all SNPs with informative ratio distributions are fitted. Almost all fitted models appear to be correct based on visual inspection and comparison with diploid samples. When the collection of potato varieties is analyzed as if it were a population, almost all markers seem to be in Hardy-Weinberg equilibrium. The R package fitTetra is freely available under the GNU Public License from http://www.plantbreeding.wur.nl/UK/software_fitTetra.html and as Additional files with this article.
format Online
Article
Text
id pubmed-3121645
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31216452011-06-24 Genotype calling in tetraploid species from bi-allelic marker data using mixture models Voorrips, Roeland E Gort, Gerrit Vosman, Ben BMC Bioinformatics Software BACKGROUND: Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either allele, or heterozygous) for diploid species from these signals, such software is not available for tetraploid species which may be scored as five alternative genotypes (aaaa, baaa, bbaa, bbba and bbbb; nulliplex to quadruplex). RESULTS: We present a novel algorithm, implemented in the R package fitTetra, to assign genotypes for bi-allelic markers to tetraploid samples from genotyping assays that produce intensity signals for both alleles. The algorithm is based on the fitting of several mixture models with five components, one for each of the five possible genotypes. The models have different numbers of parameters specifying the relation between the five component means, and some of them impose a constraint on the mixing proportions to conform to Hardy-Weinberg equilibrium (HWE) ratios. The software rejects markers that do not allow a reliable genotyping for the majority of the samples, and it assigns a missing score to samples that cannot be scored into one of the five possible genotypes with sufficient confidence. CONCLUSIONS: We have validated the software with data of a collection of 224 potato varieties assayed with an Illumina GoldenGate™ 384 SNP array and shown that all SNPs with informative ratio distributions are fitted. Almost all fitted models appear to be correct based on visual inspection and comparison with diploid samples. When the collection of potato varieties is analyzed as if it were a population, almost all markers seem to be in Hardy-Weinberg equilibrium. The R package fitTetra is freely available under the GNU Public License from http://www.plantbreeding.wur.nl/UK/software_fitTetra.html and as Additional files with this article. BioMed Central 2011-05-19 /pmc/articles/PMC3121645/ /pubmed/21595880 http://dx.doi.org/10.1186/1471-2105-12-172 Text en Copyright © 2011 Voorrips et al; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Voorrips, Roeland E
Gort, Gerrit
Vosman, Ben
Genotype calling in tetraploid species from bi-allelic marker data using mixture models
title Genotype calling in tetraploid species from bi-allelic marker data using mixture models
title_full Genotype calling in tetraploid species from bi-allelic marker data using mixture models
title_fullStr Genotype calling in tetraploid species from bi-allelic marker data using mixture models
title_full_unstemmed Genotype calling in tetraploid species from bi-allelic marker data using mixture models
title_short Genotype calling in tetraploid species from bi-allelic marker data using mixture models
title_sort genotype calling in tetraploid species from bi-allelic marker data using mixture models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121645/
https://www.ncbi.nlm.nih.gov/pubmed/21595880
http://dx.doi.org/10.1186/1471-2105-12-172
work_keys_str_mv AT voorripsroelande genotypecallingintetraploidspeciesfrombiallelicmarkerdatausingmixturemodels
AT gortgerrit genotypecallingintetraploidspeciesfrombiallelicmarkerdatausingmixturemodels
AT vosmanben genotypecallingintetraploidspeciesfrombiallelicmarkerdatausingmixturemodels