Cargando…

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources an...

Descripción completa

Detalles Bibliográficos
Autores principales: Zingaretti, Laura M., Gezan, Salvador Alejandro, Ferrão, Luis Felipe V., Osorio, Luis F., Monfort, Amparo, Muñoz, Patricio R., Whitaker, Vance M., Pérez-Enciso, Miguel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015897/
https://www.ncbi.nlm.nih.gov/pubmed/32117371
http://dx.doi.org/10.3389/fpls.2020.00025
_version_ 1783496875564859392
author Zingaretti, Laura M.
Gezan, Salvador Alejandro
Ferrão, Luis Felipe V.
Osorio, Luis F.
Monfort, Amparo
Muñoz, Patricio R.
Whitaker, Vance M.
Pérez-Enciso, Miguel
author_facet Zingaretti, Laura M.
Gezan, Salvador Alejandro
Ferrão, Luis Felipe V.
Osorio, Luis F.
Monfort, Amparo
Muñoz, Patricio R.
Whitaker, Vance M.
Pérez-Enciso, Miguel
author_sort Zingaretti, Laura M.
collection PubMed
description Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.
format Online
Article
Text
id pubmed-7015897
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-70158972020-02-28 Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species Zingaretti, Laura M. Gezan, Salvador Alejandro Ferrão, Luis Felipe V. Osorio, Luis F. Monfort, Amparo Muñoz, Patricio R. Whitaker, Vance M. Pérez-Enciso, Miguel Front Plant Sci Plant Science Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/. Frontiers Media S.A. 2020-02-06 /pmc/articles/PMC7015897/ /pubmed/32117371 http://dx.doi.org/10.3389/fpls.2020.00025 Text en Copyright © 2020 Zingaretti, Gezan, Ferrão, Osorio, Monfort, Muñoz, Whitaker and Pérez-Enciso http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Zingaretti, Laura M.
Gezan, Salvador Alejandro
Ferrão, Luis Felipe V.
Osorio, Luis F.
Monfort, Amparo
Muñoz, Patricio R.
Whitaker, Vance M.
Pérez-Enciso, Miguel
Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_full Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_fullStr Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_full_unstemmed Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_short Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species
title_sort exploring deep learning for complex trait genomic prediction in polyploid outcrossing species
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015897/
https://www.ncbi.nlm.nih.gov/pubmed/32117371
http://dx.doi.org/10.3389/fpls.2020.00025
work_keys_str_mv AT zingarettilauram exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT gezansalvadoralejandro exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT ferraoluisfelipev exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT osorioluisf exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT monfortamparo exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT munozpatricior exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT whitakervancem exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies
AT perezencisomiguel exploringdeeplearningforcomplextraitgenomicpredictioninpolyploidoutcrossingspecies