Cargando…

Using Genetic Distance to Infer the Accuracy of Genomic Prediction

The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-val...

Descripción completa

Detalles Bibliográficos
Autores principales: Scutari, Marco, Mackay, Ian, Balding, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010218/
https://www.ncbi.nlm.nih.gov/pubmed/27589268
http://dx.doi.org/10.1371/journal.pgen.1006288
_version_ 1782451650709946368
author Scutari, Marco
Mackay, Ian
Balding, David
author_facet Scutari, Marco
Mackay, Ian
Balding, David
author_sort Scutari, Marco
collection PubMed
description The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either F(ST) or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.
format Online
Article
Text
id pubmed-5010218
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50102182016-09-27 Using Genetic Distance to Infer the Accuracy of Genomic Prediction Scutari, Marco Mackay, Ian Balding, David PLoS Genet Research Article The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either F(ST) or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics. Public Library of Science 2016-09-02 /pmc/articles/PMC5010218/ /pubmed/27589268 http://dx.doi.org/10.1371/journal.pgen.1006288 Text en © 2016 Scutari et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Scutari, Marco
Mackay, Ian
Balding, David
Using Genetic Distance to Infer the Accuracy of Genomic Prediction
title Using Genetic Distance to Infer the Accuracy of Genomic Prediction
title_full Using Genetic Distance to Infer the Accuracy of Genomic Prediction
title_fullStr Using Genetic Distance to Infer the Accuracy of Genomic Prediction
title_full_unstemmed Using Genetic Distance to Infer the Accuracy of Genomic Prediction
title_short Using Genetic Distance to Infer the Accuracy of Genomic Prediction
title_sort using genetic distance to infer the accuracy of genomic prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5010218/
https://www.ncbi.nlm.nih.gov/pubmed/27589268
http://dx.doi.org/10.1371/journal.pgen.1006288
work_keys_str_mv AT scutarimarco usinggeneticdistancetoinfertheaccuracyofgenomicprediction
AT mackayian usinggeneticdistancetoinfertheaccuracyofgenomicprediction
AT baldingdavid usinggeneticdistancetoinfertheaccuracyofgenomicprediction