Cargando…

Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle

BACKGROUND: Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Chen, Zhu, Xiaojin, Weigel, Kent A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098288/
https://www.ncbi.nlm.nih.gov/pubmed/27821057
http://dx.doi.org/10.1186/s12711-016-0262-5
_version_ 1782465756801269760
author Yao, Chen
Zhu, Xiaojin
Weigel, Kent A.
author_facet Yao, Chen
Zhu, Xiaojin
Weigel, Kent A.
author_sort Yao, Chen
collection PubMed
description BACKGROUND: Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle. METHODS: We describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes. RESULTS: Incorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. CONCLUSIONS: Our results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment.
format Online
Article
Text
id pubmed-5098288
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50982882016-11-08 Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle Yao, Chen Zhu, Xiaojin Weigel, Kent A. Genet Sel Evol Research Article BACKGROUND: Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle. METHODS: We describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes. RESULTS: Incorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. CONCLUSIONS: Our results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment. BioMed Central 2016-11-07 /pmc/articles/PMC5098288/ /pubmed/27821057 http://dx.doi.org/10.1186/s12711-016-0262-5 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Yao, Chen
Zhu, Xiaojin
Weigel, Kent A.
Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
title Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
title_full Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
title_fullStr Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
title_full_unstemmed Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
title_short Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
title_sort semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098288/
https://www.ncbi.nlm.nih.gov/pubmed/27821057
http://dx.doi.org/10.1186/s12711-016-0262-5
work_keys_str_mv AT yaochen semisupervisedlearningforgenomicpredictionofnoveltraitswithsmallreferencepopulationsanapplicationtoresidualfeedintakeindairycattle
AT zhuxiaojin semisupervisedlearningforgenomicpredictionofnoveltraitswithsmallreferencepopulationsanapplicationtoresidualfeedintakeindairycattle
AT weigelkenta semisupervisedlearningforgenomicpredictionofnoveltraitswithsmallreferencepopulationsanapplicationtoresidualfeedintakeindairycattle