Cargando…

Within- and cross-species predictions of plant specialized metabolism genes using transfer learning

Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Moore, Bethany M, Wang, Peipei, Fan, Pengxiang, Lee, Aaron, Leong, Bryan, Lou, Yann-Ru, Schenck, Craig A, Sugimoto, Koichi, Last, Robert, Lehti-Shiu, Melissa D, Barry, Cornelius S, Shiu, Shin-Han
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731531/
https://www.ncbi.nlm.nih.gov/pubmed/33344884
http://dx.doi.org/10.1093/insilicoplants/diaa005
_version_ 1783621917768417280
author Moore, Bethany M
Wang, Peipei
Fan, Pengxiang
Lee, Aaron
Leong, Bryan
Lou, Yann-Ru
Schenck, Craig A
Sugimoto, Koichi
Last, Robert
Lehti-Shiu, Melissa D
Barry, Cornelius S
Shiu, Shin-Han
author_facet Moore, Bethany M
Wang, Peipei
Fan, Pengxiang
Lee, Aaron
Leong, Bryan
Lou, Yann-Ru
Schenck, Craig A
Sugimoto, Koichi
Last, Robert
Lehti-Shiu, Melissa D
Barry, Cornelius S
Shiu, Shin-Han
author_sort Moore, Bethany M
collection PubMed
description Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one.
format Online
Article
Text
id pubmed-7731531
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77315312020-12-16 Within- and cross-species predictions of plant specialized metabolism genes using transfer learning Moore, Bethany M Wang, Peipei Fan, Pengxiang Lee, Aaron Leong, Bryan Lou, Yann-Ru Schenck, Craig A Sugimoto, Koichi Last, Robert Lehti-Shiu, Melissa D Barry, Cornelius S Shiu, Shin-Han In Silico Plants Original Research Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one. Oxford University Press 2020-07-30 /pmc/articles/PMC7731531/ /pubmed/33344884 http://dx.doi.org/10.1093/insilicoplants/diaa005 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Annals of Botany Company. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
Moore, Bethany M
Wang, Peipei
Fan, Pengxiang
Lee, Aaron
Leong, Bryan
Lou, Yann-Ru
Schenck, Craig A
Sugimoto, Koichi
Last, Robert
Lehti-Shiu, Melissa D
Barry, Cornelius S
Shiu, Shin-Han
Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
title Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
title_full Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
title_fullStr Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
title_full_unstemmed Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
title_short Within- and cross-species predictions of plant specialized metabolism genes using transfer learning
title_sort within- and cross-species predictions of plant specialized metabolism genes using transfer learning
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7731531/
https://www.ncbi.nlm.nih.gov/pubmed/33344884
http://dx.doi.org/10.1093/insilicoplants/diaa005
work_keys_str_mv AT moorebethanym withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT wangpeipei withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT fanpengxiang withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT leeaaron withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT leongbryan withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT louyannru withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT schenckcraiga withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT sugimotokoichi withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT lastrobert withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT lehtishiumelissad withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT barrycorneliuss withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning
AT shiushinhan withinandcrossspeciespredictionsofplantspecializedmetabolismgenesusingtransferlearning