Cargando…

Improved genomic prediction using machine learning with Variational Bayesian sparsity

BACKGROUND: Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Qingsen, Fruzangohar, Mario, Taylor, Julian, Gong, Dong, Walter, James, Norman, Adam, Shi, Javen Qinfeng, Coram, Tristan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474716/
https://www.ncbi.nlm.nih.gov/pubmed/37660084
http://dx.doi.org/10.1186/s13007-023-01073-3
_version_ 1785100561632198656
author Yan, Qingsen
Fruzangohar, Mario
Taylor, Julian
Gong, Dong
Walter, James
Norman, Adam
Shi, Javen Qinfeng
Coram, Tristan
author_facet Yan, Qingsen
Fruzangohar, Mario
Taylor, Julian
Gong, Dong
Walter, James
Norman, Adam
Shi, Javen Qinfeng
Coram, Tristan
author_sort Yan, Qingsen
collection PubMed
description BACKGROUND: Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. RESULTS: In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. CONCLUSIONS: An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers.
format Online
Article
Text
id pubmed-10474716
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104747162023-09-03 Improved genomic prediction using machine learning with Variational Bayesian sparsity Yan, Qingsen Fruzangohar, Mario Taylor, Julian Gong, Dong Walter, James Norman, Adam Shi, Javen Qinfeng Coram, Tristan Plant Methods Methodology BACKGROUND: Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. RESULTS: In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. CONCLUSIONS: An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers. BioMed Central 2023-09-02 /pmc/articles/PMC10474716/ /pubmed/37660084 http://dx.doi.org/10.1186/s13007-023-01073-3 Text en © Crown 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Yan, Qingsen
Fruzangohar, Mario
Taylor, Julian
Gong, Dong
Walter, James
Norman, Adam
Shi, Javen Qinfeng
Coram, Tristan
Improved genomic prediction using machine learning with Variational Bayesian sparsity
title Improved genomic prediction using machine learning with Variational Bayesian sparsity
title_full Improved genomic prediction using machine learning with Variational Bayesian sparsity
title_fullStr Improved genomic prediction using machine learning with Variational Bayesian sparsity
title_full_unstemmed Improved genomic prediction using machine learning with Variational Bayesian sparsity
title_short Improved genomic prediction using machine learning with Variational Bayesian sparsity
title_sort improved genomic prediction using machine learning with variational bayesian sparsity
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474716/
https://www.ncbi.nlm.nih.gov/pubmed/37660084
http://dx.doi.org/10.1186/s13007-023-01073-3
work_keys_str_mv AT yanqingsen improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT fruzangoharmario improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT taylorjulian improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT gongdong improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT walterjames improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT normanadam improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT shijavenqinfeng improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity
AT coramtristan improvedgenomicpredictionusingmachinelearningwithvariationalbayesiansparsity