Cargando…
Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from ge...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731531/ https://www.ncbi.nlm.nih.gov/pubmed/33344884 http://dx.doi.org/10.1093/insilicoplants/diaa005 |
_version_ | 1783621917768417280 |
---|---|
author | Moore, Bethany M Wang, Peipei Fan, Pengxiang Lee, Aaron Leong, Bryan Lou, Yann-Ru Schenck, Craig A Sugimoto, Koichi Last, Robert Lehti-Shiu, Melissa D Barry, Cornelius S Shiu, Shin-Han |
author_facet | Moore, Bethany M Wang, Peipei Fan, Pengxiang Lee, Aaron Leong, Bryan Lou, Yann-Ru Schenck, Craig A Sugimoto, Koichi Last, Robert Lehti-Shiu, Melissa D Barry, Cornelius S Shiu, Shin-Han |
author_sort | Moore, Bethany M |
collection | PubMed |
description | Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one. |
format | Online Article Text |
id | pubmed-7731531 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77315312020-12-16 Within- and cross-species predictions of plant specialized metabolism genes using transfer learning Moore, Bethany M Wang, Peipei Fan, Pengxiang Lee, Aaron Leong, Bryan Lou, Yann-Ru Schenck, Craig A Sugimoto, Koichi Last, Robert Lehti-Shiu, Melissa D Barry, Cornelius S Shiu, Shin-Han In Silico Plants Original Research Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one. Oxford University Press 2020-07-30 /pmc/articles/PMC7731531/ /pubmed/33344884 http://dx.doi.org/10.1093/insilicoplants/diaa005 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Annals of Botany Company. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Research Moore, Bethany M Wang, Peipei Fan, Pengxiang Lee, Aaron Leong, Bryan Lou, Yann-Ru Schenck, Craig A Sugimoto, Koichi Last, Robert Lehti-Shiu, Melissa D Barry, Cornelius S Shiu, Shin-Han Within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
title | Within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
title_full | Within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
title_fullStr | Within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
title_full_unstemmed | Within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
title_short | Within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
title_sort | within- and cross-species predictions of plant specialized metabolism genes using transfer learning |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731531/ https://www.ncbi.nlm.nih.gov/pubmed/33344884 http://dx.doi.org/10.1093/insilicoplants/diaa005 |
work_keys_str_mv | AT moorebethanym withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT wangpeipei withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT fanpengxiang withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT leeaaron withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT leongbryan withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT louyannru withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT schenckcraiga withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT sugimotokoichi withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT lastrobert withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT lehtishiumelissad withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT barrycorneliuss withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning AT shiushinhan withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning |