Cargando…

Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy

Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large propor...

Descripción completa

Detalles Bibliográficos
Autores principales: Rutkoski, Jessica E., Poland, Jesse, Jannink, Jean-Luc, Sorrells, Mark E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3583451/
https://www.ncbi.nlm.nih.gov/pubmed/23449944
http://dx.doi.org/10.1534/g3.112.005363
_version_ 1782475426236465152
author Rutkoski, Jessica E.
Poland, Jesse
Jannink, Jean-Luc
Sorrells, Mark E.
author_facet Rutkoski, Jessica E.
Poland, Jesse
Jannink, Jean-Luc
Sorrells, Mark E.
author_sort Rutkoski, Jessica E.
collection PubMed
description Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known.
format Online
Article
Text
id pubmed-3583451
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-35834512013-03-01 Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy Rutkoski, Jessica E. Poland, Jesse Jannink, Jean-Luc Sorrells, Mark E. G3 (Bethesda) Genomic Selection Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known. Genetics Society of America 2013-03-01 /pmc/articles/PMC3583451/ /pubmed/23449944 http://dx.doi.org/10.1534/g3.112.005363 Text en Copyright © 2013 Rutkoski et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomic Selection
Rutkoski, Jessica E.
Poland, Jesse
Jannink, Jean-Luc
Sorrells, Mark E.
Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
title Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
title_full Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
title_fullStr Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
title_full_unstemmed Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
title_short Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy
title_sort imputation of unordered markers and the impact on genomic selection accuracy
topic Genomic Selection
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3583451/
https://www.ncbi.nlm.nih.gov/pubmed/23449944
http://dx.doi.org/10.1534/g3.112.005363
work_keys_str_mv AT rutkoskijessicae imputationofunorderedmarkersandtheimpactongenomicselectionaccuracy
AT polandjesse imputationofunorderedmarkersandtheimpactongenomicselectionaccuracy
AT janninkjeanluc imputationofunorderedmarkersandtheimpactongenomicselectionaccuracy
AT sorrellsmarke imputationofunorderedmarkersandtheimpactongenomicselectionaccuracy