Cargando…

Training set optimization of genomic prediction by means of EthAcc

Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretic...

Descripción completa

Detalles Bibliográficos
Autores principales: Mangin, Brigitte, Rincent, Renaud, Rabier, Charles-Elie, Moreau, Laurence, Goudemand-Dugue, Ellen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380617/
https://www.ncbi.nlm.nih.gov/pubmed/30779753
http://dx.doi.org/10.1371/journal.pone.0205629
_version_ 1783396330886922240
author Mangin, Brigitte
Rincent, Renaud
Rabier, Charles-Elie
Moreau, Laurence
Goudemand-Dugue, Ellen
author_facet Mangin, Brigitte
Rincent, Renaud
Rabier, Charles-Elie
Moreau, Laurence
Goudemand-Dugue, Ellen
author_sort Mangin, Brigitte
collection PubMed
description Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc’s precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization.
format Online
Article
Text
id pubmed-6380617
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63806172019-03-01 Training set optimization of genomic prediction by means of EthAcc Mangin, Brigitte Rincent, Renaud Rabier, Charles-Elie Moreau, Laurence Goudemand-Dugue, Ellen PLoS One Research Article Genomic prediction is a useful tool for plant and animal breeding programs and is starting to be used to predict human diseases as well. A shortcoming that slows down the genomic selection deployment is that the accuracy of the prediction is not known a priori. We propose EthAcc (Estimated THeoretical ACCuracy) as a method for estimating the accuracy given a training set that is genotyped and phenotyped. EthAcc is based on a causal quantitative trait loci model estimated by a genome-wide association study. This estimated causal model is crucial; therefore, we compared different methods to find the one yielding the best EthAcc. The multilocus mixed model was found to perform the best. We compared EthAcc to accuracy estimators that can be derived via a mixed marker model. We showed that EthAcc is the only approach to correctly estimate the accuracy. Moreover, in case of a structured population, in accordance with the achieved accuracy, EthAcc showed that the biggest training set is not always better than a smaller and closer training set. We then performed training set optimization with EthAcc and compared it to CDmean. EthAcc outperformed CDmean on real datasets from sugar beet, maize, and wheat. Nonetheless, its performance was mainly due to the use of an optimal but inaccessible set as a start of the optimization algorithm. EthAcc’s precision and algorithm issues prevent it from reaching a good training set with a random start. Despite this drawback, we demonstrated that a substantial gain in accuracy can be obtained by performing training set optimization. Public Library of Science 2019-02-19 /pmc/articles/PMC6380617/ /pubmed/30779753 http://dx.doi.org/10.1371/journal.pone.0205629 Text en © 2019 Mangin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mangin, Brigitte
Rincent, Renaud
Rabier, Charles-Elie
Moreau, Laurence
Goudemand-Dugue, Ellen
Training set optimization of genomic prediction by means of EthAcc
title Training set optimization of genomic prediction by means of EthAcc
title_full Training set optimization of genomic prediction by means of EthAcc
title_fullStr Training set optimization of genomic prediction by means of EthAcc
title_full_unstemmed Training set optimization of genomic prediction by means of EthAcc
title_short Training set optimization of genomic prediction by means of EthAcc
title_sort training set optimization of genomic prediction by means of ethacc
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380617/
https://www.ncbi.nlm.nih.gov/pubmed/30779753
http://dx.doi.org/10.1371/journal.pone.0205629
work_keys_str_mv AT manginbrigitte trainingsetoptimizationofgenomicpredictionbymeansofethacc
AT rincentrenaud trainingsetoptimizationofgenomicpredictionbymeansofethacc
AT rabiercharleselie trainingsetoptimizationofgenomicpredictionbymeansofethacc
AT moreaulaurence trainingsetoptimizationofgenomicpredictionbymeansofethacc
AT goudemanddugueellen trainingsetoptimizationofgenomicpredictionbymeansofethacc