Cargando…

A comparison of statistical methods for genomic selection in a mice population

BACKGROUND: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the nu...

Descripción completa

Detalles Bibliográficos
Autores principales: Neves, Haroldo HR, Carvalheiro, Roberto, Queiroz, Sandra A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563460/
https://www.ncbi.nlm.nih.gov/pubmed/23134637
http://dx.doi.org/10.1186/1471-2156-13-100
_version_ 1782258188678070272
author Neves, Haroldo HR
Carvalheiro, Roberto
Queiroz, Sandra A
author_facet Neves, Haroldo HR
Carvalheiro, Roberto
Queiroz, Sandra A
author_sort Neves, Haroldo HR
collection PubMed
description BACKGROUND: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population. RESULTS: For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells), within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS) and Support Vector Regression (SVR), were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP) figured among the most accurate for WGS and BL, while two variable selection methods ( LASSO and Random Forest, RF) had the greatest predictive abilities for %CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions. CONCLUSIONS: Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of %CD8+ and CD4+/CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR_GBLUP, RKHS and SVR are particularly appealing for application in genomic selection.
format Online
Article
Text
id pubmed-3563460
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35634602013-02-08 A comparison of statistical methods for genomic selection in a mice population Neves, Haroldo HR Carvalheiro, Roberto Queiroz, Sandra A BMC Genet Research Article BACKGROUND: The availability of high-density panels of SNP markers has opened new perspectives for marker-assisted selection strategies, such that genotypes for these markers are used to predict the genetic merit of selection candidates. Because the number of markers is often much larger than the number of phenotypes, marker effect estimation is not a trivial task. The objective of this research was to compare the predictive performance of ten different statistical methods employed in genomic selection, by analyzing data from a heterogeneous stock mice population. RESULTS: For the five traits analyzed (W6W: weight at six weeks, WGS: growth slope, BL: body length, %CD8+: percentage of CD8+ cells, CD4+/ CD8+: ratio between CD4+ and CD8+ cells), within-family predictions were more accurate than across-family predictions, although this superiority in accuracy varied markedly across traits. For within-family prediction, two kernel methods, Reproducing Kernel Hilbert Spaces Regression (RKHS) and Support Vector Regression (SVR), were the most accurate for W6W, while a polygenic model also had comparable performance. A form of ridge regression assuming that all markers contribute to the additive variance (RR_GBLUP) figured among the most accurate for WGS and BL, while two variable selection methods ( LASSO and Random Forest, RF) had the greatest predictive abilities for %CD8+ and CD4+/ CD8+. RF, RKHS, SVR and RR_GBLUP outperformed the remainder methods in terms of bias and inflation of predictions. CONCLUSIONS: Methods with large conceptual differences reached very similar predictive abilities and a clear re-ranking of methods was observed in function of the trait analyzed. Variable selection methods were more accurate than the remainder in the case of %CD8+ and CD4+/CD8+ and these traits are likely to be influenced by a smaller number of QTL than the remainder. Judged by their overall performance across traits and computational requirements, RR_GBLUP, RKHS and SVR are particularly appealing for application in genomic selection. BioMed Central 2012-11-08 /pmc/articles/PMC3563460/ /pubmed/23134637 http://dx.doi.org/10.1186/1471-2156-13-100 Text en Copyright ©2012 Neves et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Neves, Haroldo HR
Carvalheiro, Roberto
Queiroz, Sandra A
A comparison of statistical methods for genomic selection in a mice population
title A comparison of statistical methods for genomic selection in a mice population
title_full A comparison of statistical methods for genomic selection in a mice population
title_fullStr A comparison of statistical methods for genomic selection in a mice population
title_full_unstemmed A comparison of statistical methods for genomic selection in a mice population
title_short A comparison of statistical methods for genomic selection in a mice population
title_sort comparison of statistical methods for genomic selection in a mice population
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563460/
https://www.ncbi.nlm.nih.gov/pubmed/23134637
http://dx.doi.org/10.1186/1471-2156-13-100
work_keys_str_mv AT nevesharoldohr acomparisonofstatisticalmethodsforgenomicselectioninamicepopulation
AT carvalheiroroberto acomparisonofstatisticalmethodsforgenomicselectioninamicepopulation
AT queirozsandraa acomparisonofstatisticalmethodsforgenomicselectioninamicepopulation
AT nevesharoldohr comparisonofstatisticalmethodsforgenomicselectioninamicepopulation
AT carvalheiroroberto comparisonofstatisticalmethodsforgenomicselectioninamicepopulation
AT queirozsandraa comparisonofstatisticalmethodsforgenomicselectioninamicepopulation