Cargando…
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes
BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning alg...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038529/ https://www.ncbi.nlm.nih.gov/pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z |
_version_ | 1783500661439070208 |
---|---|
author | Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco |
author_facet | Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco |
author_sort | Abdollahi-Arpanahi, Rostam |
collection | PubMed |
description | BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable. |
format | Online Article Text |
id | pubmed-7038529 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70385292020-03-02 Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco Genet Sel Evol Research Article BACKGROUND: Transforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets. METHODS: The real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000). RESULTS: In the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action. CONCLUSIONS: For prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable. BioMed Central 2020-02-24 /pmc/articles/PMC7038529/ /pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Abdollahi-Arpanahi, Rostam Gianola, Daniel Peñagaricano, Francisco Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
title | Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
title_full | Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
title_fullStr | Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
title_full_unstemmed | Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
title_short | Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
title_sort | deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7038529/ https://www.ncbi.nlm.nih.gov/pubmed/32093611 http://dx.doi.org/10.1186/s12711-020-00531-z |
work_keys_str_mv | AT abdollahiarpanahirostam deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes AT gianoladaniel deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes AT penagaricanofrancisco deeplearningversusparametricandensemblemethodsforgenomicpredictionofcomplexphenotypes |