Cargando…

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans

BACKGROUND: Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xavier, A., Muir, William M., Rainey, Katy M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4736474/ https://www.ncbi.nlm.nih.gov/pubmed/26830693 http://dx.doi.org/10.1186/s12859-016-0899-7

_version_	1782413291927109632
author	Xavier, A. Muir, William M. Rainey, Katy M.
author_facet	Xavier, A. Muir, William M. Rainey, Katy M.
author_sort	Xavier, A.
collection	PubMed
description	BACKGROUND: Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition. RESULTS: We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper. CONCLUSIONS: We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic prediction were not observed by changing the imputation method. We identified significant differences across imputation methods in a dataset missing 20 % of the genotypic values. It means that genotypic data from genotyping technologies that provide a high proportion of missing values, such as GBS, should be handled carefully because the imputation method will impact downstream analysis.
format	Online Article Text
id	pubmed-4736474
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47364742016-02-03 Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans Xavier, A. Muir, William M. Rainey, Katy M. BMC Bioinformatics Methodology Article BACKGROUND: Success in genome-wide association studies and marker-assisted selection depends on good phenotypic and genotypic data. The more complete this data is, the more powerful will be the results of analysis. Nevertheless, there are next-generation technologies that seek to provide genotypic information in spite of great proportions of missing data. The procedures these technologies use to impute genetic data, therefore, greatly affect downstream analyses. This study aims to (1) compare the genetic variance in a single-nucleotide polymorphism panel of soybean with missing data imputed using various methods, (2) evaluate the imputation accuracy and post-imputation quality associated with these methods, and (3) evaluate the impact of imputation method on heritability and the accuracy of genome-wide prediction of soybean traits. The imputation methods we evaluated were as follows: multivariate mixed model, hidden Markov model, logical algorithm, k-nearest neighbor, single value decomposition, and random forest. We used raw genotypes from the SoyNAM project and the following phenotypes: plant height, days to maturity, grain yield, and seed protein composition. RESULTS: We propose an imputation method based on multivariate mixed models using pedigree information. Our methods comparison indicate that heritability of traits can be affected by the imputation method. Genotypes with missing values imputed with methods that make use of genealogic information can favor genetic analysis of highly polygenic traits, but not genome-wide prediction accuracy. The genotypic matrix captured the highest amount of genetic variance when missing loci were imputed by the method proposed in this paper. CONCLUSIONS: We concluded that hidden Markov models and random forest imputation are more suitable to studies that aim analyses of highly heritable traits while pedigree-based methods can be used to best analyze traits with low heritability. Despite the notable contribution to heritability, advantages in genomic prediction were not observed by changing the imputation method. We identified significant differences across imputation methods in a dataset missing 20 % of the genotypic values. It means that genotypic data from genotyping technologies that provide a high proportion of missing values, such as GBS, should be handled carefully because the imputation method will impact downstream analysis. BioMed Central 2016-02-02 /pmc/articles/PMC4736474/ /pubmed/26830693 http://dx.doi.org/10.1186/s12859-016-0899-7 Text en © Xavier et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Xavier, A. Muir, William M. Rainey, Katy M. Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
title	Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
title_full	Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
title_fullStr	Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
title_full_unstemmed	Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
title_short	Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
title_sort	impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4736474/ https://www.ncbi.nlm.nih.gov/pubmed/26830693 http://dx.doi.org/10.1186/s12859-016-0899-7
work_keys_str_mv	AT xaviera impactofimputationmethodsontheamountofgeneticvariationcapturedbyasinglenucleotidepolymorphismpanelinsoybeans AT muirwilliamm impactofimputationmethodsontheamountofgeneticvariationcapturedbyasinglenucleotidepolymorphismpanelinsoybeans AT raineykatym impactofimputationmethodsontheamountofgeneticvariationcapturedbyasinglenucleotidepolymorphismpanelinsoybeans

Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans

Ejemplares similares