Cargando…

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning alg...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdollahi-Arpanahi, Rostam, Gianola, Daniel, Peñagaricano, Francisco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038529/
https://www.ncbi.nlm.nih.gov/pubmed/32093611
http://dx.doi.org/10.1186/s12711-020-00531-z
_version_ 1783500661439070208
author Abdollahi-Arpanahi, Rostam
Gianola, Daniel
Peñagaricano, Francisco
author_facet Abdollahi-Arpanahi, Rostam
Gianola, Daniel
Peñagaricano, Francisco
author_sort Abdollahi-Arpanahi, Rostam
collection PubMed
description BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.
format Online
Article
Text
id pubmed-7038529
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70385292020-03-02 Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco Genet Sel Evol Research Article BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable. BioMed Central 2020-02-24 /pmc/articles/PMC7038529/ /pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Abdollahi-Arpanahi, Rostam
Gianola, Daniel
Peñagaricano, Francisco
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_full Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_fullStr Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_full_unstemmed Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_short Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_sort deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038529/
https://www.ncbi.nlm.nih.gov/pubmed/32093611
http://dx.doi.org/10.1186/s12711-020-00531-z
work_keys_str_mv AT abdollahiarpanahirostam deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes
AT gianoladaniel deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes
AT penagaricanofrancisco deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes