Cargando…

PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment

With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Yuntao, Li, Qi, Li, Tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9013948/
https://www.ncbi.nlm.nih.gov/pubmed/35444686
http://dx.doi.org/10.3389/fgene.2022.839453
_version_ 1784688109746651136
author Lu, Yuntao
Li, Qi
Li, Tao
author_facet Lu, Yuntao
Li, Qi
Li, Tao
author_sort Lu, Yuntao
collection PubMed
description With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate for gene metabolism pathways is lower than 48% on the whole. It is even lower for newly sequenced prokaryotic genomes, which has become a bottleneck for subsequent research. Thus, the development of a high-precision metabolic pathway assignment framework is urgently needed. Here, we developed PPA-GCN, a prokaryotic pathways assignment framework based on graph convolutional network, to assist functional pathway assignments using KEGG information and genomic characteristics. In the framework, genomic gene synteny information was used to construct a network, and ideas of self-supervised learning were inspired to enhance the framework’s learning ability. Our framework is applicable to the genera of microbe with sufficient whole genome sequences. To evaluate the assignment rate, genomes from three different genera (Flavobacterium (65 genomes) and Pseudomonas (100 genomes), Staphylococcus (500 genomes)) were used. The initial functional pathway assignment rate of the three test genera were 27.7% (Flavobacterium), 49.5% (Pseudomonas) and 30.1% (Staphylococcus). PPA-GCN achieved excellence performance of 84.8% (Flavobacterium), 77.0% (Pseudomonas) and 71.0% (Staphylococcus) for assignment rate. At the same time, PPA-GCN was proved to have strong fault tolerance. The framework provides novel insights into assignment for metabolism pathways and is likely to inform future deep learning applications for interpreting functional annotations and extends to all prokaryotic genera with sufficient genomes.
format Online
Article
Text
id pubmed-9013948
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90139482022-04-19 PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment Lu, Yuntao Li, Qi Li, Tao Front Genet Genetics With the rapid development of sequencing technology, completed genomes of microbes have explosively emerged. For a newly sequenced prokaryotic genome, gene functional annotation and metabolism pathway assignment are important foundations for all subsequent research work. However, the assignment rate for gene metabolism pathways is lower than 48% on the whole. It is even lower for newly sequenced prokaryotic genomes, which has become a bottleneck for subsequent research. Thus, the development of a high-precision metabolic pathway assignment framework is urgently needed. Here, we developed PPA-GCN, a prokaryotic pathways assignment framework based on graph convolutional network, to assist functional pathway assignments using KEGG information and genomic characteristics. In the framework, genomic gene synteny information was used to construct a network, and ideas of self-supervised learning were inspired to enhance the framework’s learning ability. Our framework is applicable to the genera of microbe with sufficient whole genome sequences. To evaluate the assignment rate, genomes from three different genera (Flavobacterium (65 genomes) and Pseudomonas (100 genomes), Staphylococcus (500 genomes)) were used. The initial functional pathway assignment rate of the three test genera were 27.7% (Flavobacterium), 49.5% (Pseudomonas) and 30.1% (Staphylococcus). PPA-GCN achieved excellence performance of 84.8% (Flavobacterium), 77.0% (Pseudomonas) and 71.0% (Staphylococcus) for assignment rate. At the same time, PPA-GCN was proved to have strong fault tolerance. The framework provides novel insights into assignment for metabolism pathways and is likely to inform future deep learning applications for interpreting functional annotations and extends to all prokaryotic genera with sufficient genomes. Frontiers Media S.A. 2022-04-04 /pmc/articles/PMC9013948/ /pubmed/35444686 http://dx.doi.org/10.3389/fgene.2022.839453 Text en Copyright © 2022 Lu, Li and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Lu, Yuntao
Li, Qi
Li, Tao
PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment
title PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment
title_full PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment
title_fullStr PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment
title_full_unstemmed PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment
title_short PPA-GCN: A Efficient GCN Framework for Prokaryotic Pathways Assignment
title_sort ppa-gcn: a efficient gcn framework for prokaryotic pathways assignment
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9013948/
https://www.ncbi.nlm.nih.gov/pubmed/35444686
http://dx.doi.org/10.3389/fgene.2022.839453
work_keys_str_mv AT luyuntao ppagcnaefficientgcnframeworkforprokaryoticpathwaysassignment
AT liqi ppagcnaefficientgcnframeworkforprokaryoticpathwaysassignment
AT litao ppagcnaefficientgcnframeworkforprokaryoticpathwaysassignment