Cargando…
A directed learning strategy integrating multiple omic data improves genomic prediction
Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737184/ https://www.ncbi.nlm.nih.gov/pubmed/30950198 http://dx.doi.org/10.1111/pbi.13117 |
_version_ | 1783450623424856064 |
---|---|
author | Hu, Xuehai Xie, Weibo Wu, Chengchao Xu, Shizhong |
author_facet | Hu, Xuehai Xie, Weibo Wu, Chengchao Xu, Shizhong |
author_sort | Hu, Xuehai |
collection | PubMed |
description | Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait‐related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models. |
format | Online Article Text |
id | pubmed-6737184 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-67371842019-09-16 A directed learning strategy integrating multiple omic data improves genomic prediction Hu, Xuehai Xie, Weibo Wu, Chengchao Xu, Shizhong Plant Biotechnol J Research Articles Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait‐related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models. John Wiley and Sons Inc. 2019-04-14 2019-10 /pmc/articles/PMC6737184/ /pubmed/30950198 http://dx.doi.org/10.1111/pbi.13117 Text en © 2019 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Articles Hu, Xuehai Xie, Weibo Wu, Chengchao Xu, Shizhong A directed learning strategy integrating multiple omic data improves genomic prediction |
title | A directed learning strategy integrating multiple omic data improves genomic prediction |
title_full | A directed learning strategy integrating multiple omic data improves genomic prediction |
title_fullStr | A directed learning strategy integrating multiple omic data improves genomic prediction |
title_full_unstemmed | A directed learning strategy integrating multiple omic data improves genomic prediction |
title_short | A directed learning strategy integrating multiple omic data improves genomic prediction |
title_sort | directed learning strategy integrating multiple omic data improves genomic prediction |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737184/ https://www.ncbi.nlm.nih.gov/pubmed/30950198 http://dx.doi.org/10.1111/pbi.13117 |
work_keys_str_mv | AT huxuehai adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT xieweibo adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT wuchengchao adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT xushizhong adirectedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT huxuehai directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT xieweibo directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT wuchengchao directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction AT xushizhong directedlearningstrategyintegratingmultipleomicdataimprovesgenomicprediction |