Cargando…

A directed learning strategy integrating multiple omic data improves genomic prediction

Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous s...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Xuehai, Xie, Weibo, Wu, Chengchao, Xu, Shizhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737184/
https://www.ncbi.nlm.nih.gov/pubmed/30950198
http://dx.doi.org/10.1111/pbi.13117
_version_ 1783450623424856064
author Hu, Xuehai
Xie, Weibo
Wu, Chengchao
Xu, Shizhong
author_facet Hu, Xuehai
Xie, Weibo
Wu, Chengchao
Xu, Shizhong
author_sort Hu, Xuehai
collection PubMed
description Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait‐related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.
format Online
Article
Text
id pubmed-6737184
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-67371842019-09-16 A directed learning strategy integrating multiple omic data improves genomic prediction Hu, Xuehai Xie, Weibo Wu, Chengchao Xu, Shizhong Plant Biotechnol J Research Articles Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait‐related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models. John Wiley and Sons Inc. 2019-04-14 2019-10 /pmc/articles/PMC6737184/ /pubmed/30950198 http://dx.doi.org/10.1111/pbi.13117 Text en © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Hu, Xuehai
Xie, Weibo
Wu, Chengchao
Xu, Shizhong
A directed learning strategy integrating multiple omic data improves genomic prediction
title A directed learning strategy integrating multiple omic data improves genomic prediction
title_full A directed learning strategy integrating multiple omic data improves genomic prediction
title_fullStr A directed learning strategy integrating multiple omic data improves genomic prediction
title_full_unstemmed A directed learning strategy integrating multiple omic data improves genomic prediction
title_short A directed learning strategy integrating multiple omic data improves genomic prediction
title_sort directed learning strategy integrating multiple omic data improves genomic prediction
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737184/
https://www.ncbi.nlm.nih.gov/pubmed/30950198
http://dx.doi.org/10.1111/pbi.13117
work_keys_str_mv AT huxuehai adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT xieweibo adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT wuchengchao adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT xushizhong adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT huxuehai directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT xieweibo directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT wuchengchao directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction
AT xushizhong directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction