Cargando…

Tournaments between markers as a strategy to enhance genomic predictions

Analysis of a large number of markers is crucial in both genome-wide association studies (GWAS) and genome-wide selection (GWS). However there are two methodological issues that restrict statistical analysis: high dimensionality (p≫n) and multicollinearity. Although there are methodologies that can...

Descripción completa

Detalles Bibliográficos
Autores principales: Filho, Diógenes Ferreira, Filho, Júlio Sílvio de Sousa Bueno, Regitano, Luciana Correia de Almeida, de Alencar, Maurício Mello, Alves, Rosiana Rodrigues, Meirelles, Sarah Laguna Conceição
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6590785/
https://www.ncbi.nlm.nih.gov/pubmed/31233512
http://dx.doi.org/10.1371/journal.pone.0217283
_version_ 1783429626059554816
author Filho, Diógenes Ferreira
Filho, Júlio Sílvio de Sousa Bueno
Regitano, Luciana Correia de Almeida
de Alencar, Maurício Mello
Alves, Rosiana Rodrigues
Meirelles, Sarah Laguna Conceição
author_facet Filho, Diógenes Ferreira
Filho, Júlio Sílvio de Sousa Bueno
Regitano, Luciana Correia de Almeida
de Alencar, Maurício Mello
Alves, Rosiana Rodrigues
Meirelles, Sarah Laguna Conceição
author_sort Filho, Diógenes Ferreira
collection PubMed
description Analysis of a large number of markers is crucial in both genome-wide association studies (GWAS) and genome-wide selection (GWS). However there are two methodological issues that restrict statistical analysis: high dimensionality (p≫n) and multicollinearity. Although there are methodologies that can be used to fit models for data with high dimensionality (eg, the Bayesian Lasso), a big problem that can occurs in this cases is that the predictive ability of the model should perform well for the individuals used to fit the model, but should not perform well for other individuals, restricting the applicability of the model. This problem can be circumvent by applying some selection methodology to reduce the number of markers (but keeping the markers associated with the phenotypic trait) before adjusting a model to predict GBVs. We revisit a tournament-based strategy between marker samples, where each sample has good statistical properties for estimation: n>p and low collinearity. Such tournaments are elaborated using multiple linear regression to eliminate markers. This method is adapted from previous works found in the literature. We used simulated data as well as real data derived from a study with SNPs in beef cattle. Tournament strategies not only circumvent the p≫n issue, but also minimize spurious associations. For real data, when we selected a few more than 20 markers, we obtained correlations greater than 0.70 between predicted Genomic Breeding Values (GBVs) and phenotypes in validation groups of a cross-validation scheme; and when we selected a larger number of markers (more than 100), the correlations exceeded 0.90, showing the efficiency in identifying relevant SNPs (or segregations) for both GWAS and GWS. In the simulation study, we obtained similar results.
format Online
Article
Text
id pubmed-6590785
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65907852019-07-05 Tournaments between markers as a strategy to enhance genomic predictions Filho, Diógenes Ferreira Filho, Júlio Sílvio de Sousa Bueno Regitano, Luciana Correia de Almeida de Alencar, Maurício Mello Alves, Rosiana Rodrigues Meirelles, Sarah Laguna Conceição PLoS One Research Article Analysis of a large number of markers is crucial in both genome-wide association studies (GWAS) and genome-wide selection (GWS). However there are two methodological issues that restrict statistical analysis: high dimensionality (p≫n) and multicollinearity. Although there are methodologies that can be used to fit models for data with high dimensionality (eg, the Bayesian Lasso), a big problem that can occurs in this cases is that the predictive ability of the model should perform well for the individuals used to fit the model, but should not perform well for other individuals, restricting the applicability of the model. This problem can be circumvent by applying some selection methodology to reduce the number of markers (but keeping the markers associated with the phenotypic trait) before adjusting a model to predict GBVs. We revisit a tournament-based strategy between marker samples, where each sample has good statistical properties for estimation: n>p and low collinearity. Such tournaments are elaborated using multiple linear regression to eliminate markers. This method is adapted from previous works found in the literature. We used simulated data as well as real data derived from a study with SNPs in beef cattle. Tournament strategies not only circumvent the p≫n issue, but also minimize spurious associations. For real data, when we selected a few more than 20 markers, we obtained correlations greater than 0.70 between predicted Genomic Breeding Values (GBVs) and phenotypes in validation groups of a cross-validation scheme; and when we selected a larger number of markers (more than 100), the correlations exceeded 0.90, showing the efficiency in identifying relevant SNPs (or segregations) for both GWAS and GWS. In the simulation study, we obtained similar results. Public Library of Science 2019-06-24 /pmc/articles/PMC6590785/ /pubmed/31233512 http://dx.doi.org/10.1371/journal.pone.0217283 Text en © 2019 Filho et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Filho, Diógenes Ferreira
Filho, Júlio Sílvio de Sousa Bueno
Regitano, Luciana Correia de Almeida
de Alencar, Maurício Mello
Alves, Rosiana Rodrigues
Meirelles, Sarah Laguna Conceição
Tournaments between markers as a strategy to enhance genomic predictions
title Tournaments between markers as a strategy to enhance genomic predictions
title_full Tournaments between markers as a strategy to enhance genomic predictions
title_fullStr Tournaments between markers as a strategy to enhance genomic predictions
title_full_unstemmed Tournaments between markers as a strategy to enhance genomic predictions
title_short Tournaments between markers as a strategy to enhance genomic predictions
title_sort tournaments between markers as a strategy to enhance genomic predictions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6590785/
https://www.ncbi.nlm.nih.gov/pubmed/31233512
http://dx.doi.org/10.1371/journal.pone.0217283
work_keys_str_mv AT filhodiogenesferreira tournamentsbetweenmarkersasastrategytoenhancegenomicpredictions
AT filhojuliosilviodesousabueno tournamentsbetweenmarkersasastrategytoenhancegenomicpredictions
AT regitanolucianacorreiadealmeida tournamentsbetweenmarkersasastrategytoenhancegenomicpredictions
AT dealencarmauriciomello tournamentsbetweenmarkersasastrategytoenhancegenomicpredictions
AT alvesrosianarodrigues tournamentsbetweenmarkersasastrategytoenhancegenomicpredictions
AT meirellessarahlagunaconceicao tournamentsbetweenmarkersasastrategytoenhancegenomicpredictions