Cargando…
Improved genomic prediction using machine learning with Variational Bayesian sparsity
BACKGROUND: Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers ex...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474716/ https://www.ncbi.nlm.nih.gov/pubmed/37660084 http://dx.doi.org/10.1186/s13007-023-01073-3 |
_version_ | 1785100561632198656 |
---|---|
author | Yan, Qingsen Fruzangohar, Mario Taylor, Julian Gong, Dong Walter, James Norman, Adam Shi, Javen Qinfeng Coram, Tristan |
author_facet | Yan, Qingsen Fruzangohar, Mario Taylor, Julian Gong, Dong Walter, James Norman, Adam Shi, Javen Qinfeng Coram, Tristan |
author_sort | Yan, Qingsen |
collection | PubMed |
description | BACKGROUND: Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. RESULTS: In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. CONCLUSIONS: An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers. |
format | Online Article Text |
id | pubmed-10474716 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104747162023-09-03 Improved genomic prediction using machine learning with Variational Bayesian sparsity Yan, Qingsen Fruzangohar, Mario Taylor, Julian Gong, Dong Walter, James Norman, Adam Shi, Javen Qinfeng Coram, Tristan Plant Methods Methodology BACKGROUND: Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. RESULTS: In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. CONCLUSIONS: An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers. BioMed Central 2023-09-02 /pmc/articles/PMC10474716/ /pubmed/37660084 http://dx.doi.org/10.1186/s13007-023-01073-3 Text en © Crown 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Yan, Qingsen Fruzangohar, Mario Taylor, Julian Gong, Dong Walter, James Norman, Adam Shi, Javen Qinfeng Coram, Tristan Improved genomic prediction using machine learning with Variational Bayesian sparsity |
title | Improved genomic prediction using machine learning with Variational Bayesian sparsity |
title_full | Improved genomic prediction using machine learning with Variational Bayesian sparsity |
title_fullStr | Improved genomic prediction using machine learning with Variational Bayesian sparsity |
title_full_unstemmed | Improved genomic prediction using machine learning with Variational Bayesian sparsity |
title_short | Improved genomic prediction using machine learning with Variational Bayesian sparsity |
title_sort | improved genomic prediction using machine learning with variational bayesian sparsity |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474716/ https://www.ncbi.nlm.nih.gov/pubmed/37660084 http://dx.doi.org/10.1186/s13007-023-01073-3 |
work_keys_str_mv | AT yanqingsen improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT fruzangoharmario improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT taylorjulian improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT gongdong improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT walterjames improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT normanadam improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT shijavenqinfeng improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity AT coramtristan improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity |