Cargando…

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning alg...

Descripción completa

Detalles Bibliográficos
Autores principales:	Abdollahi-Arpanahi, Rostam, Gianola, Daniel, Peñagaricano, Francisco
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038529/ https://www.ncbi.nlm.nih.gov/pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z

_version_	1783500661439070208
author	Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco
author_facet	Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco
author_sort	Abdollahi-Arpanahi, Rostam
collection	PubMed
description	BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.
format	Online Article Text
id	pubmed-7038529
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-70385292020-03-02 Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco Genet Sel Evol Research Article BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable. BioMed Central 2020-02-24 /pmc/articles/PMC7038529/ /pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title	Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_full	Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_fullStr	Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_full_unstemmed	Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_short	Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
title_sort	deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038529/ https://www.ncbi.nlm.nih.gov/pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z
work_keys_str_mv	AT abdollahiarpanahirostam deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes AT gianoladaniel deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes AT penagaricanofrancisco deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

Ejemplares similares