Cargando…
Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle
BACKGROUND: Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098288/ https://www.ncbi.nlm.nih.gov/pubmed/27821057 http://dx.doi.org/10.1186/s12711-016-0262-5 |
_version_ | 1782465756801269760 |
---|---|
author | Yao, Chen Zhu, Xiaojin Weigel, Kent A. |
author_facet | Yao, Chen Zhu, Xiaojin Weigel, Kent A. |
author_sort | Yao, Chen |
collection | PubMed |
description | BACKGROUND: Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle. METHODS: We describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes. RESULTS: Incorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. CONCLUSIONS: Our results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment. |
format | Online Article Text |
id | pubmed-5098288 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50982882016-11-08 Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle Yao, Chen Zhu, Xiaojin Weigel, Kent A. Genet Sel Evol Research Article BACKGROUND: Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle. METHODS: We describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes. RESULTS: Incorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. CONCLUSIONS: Our results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment. BioMed Central 2016-11-07 /pmc/articles/PMC5098288/ /pubmed/27821057 http://dx.doi.org/10.1186/s12711-016-0262-5 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Yao, Chen Zhu, Xiaojin Weigel, Kent A. Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
title | Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
title_full | Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
title_fullStr | Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
title_full_unstemmed | Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
title_short | Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
title_sort | semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098288/ https://www.ncbi.nlm.nih.gov/pubmed/27821057 http://dx.doi.org/10.1186/s12711-016-0262-5 |
work_keys_str_mv | AT yaochen semisupervisedlearningforgenomicpredictionofnoveltraitswithsmallreferencepopulationsanapplicationtoresidualfeedintakeindairycattle AT zhuxiaojin semisupervisedlearningforgenomicpredictionofnoveltraitswithsmallreferencepopulationsanapplicationtoresidualfeedintakeindairycattle AT weigelkenta semisupervisedlearningforgenomicpredictionofnoveltraitswithsmallreferencepopulationsanapplicationtoresidualfeedintakeindairycattle |