Cargando…

Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction

Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Yi-Heng, Zhang, Chengxin, Yu, Dong-Jun, Zhang, Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822105/
https://www.ncbi.nlm.nih.gov/pubmed/36548439
http://dx.doi.org/10.1371/journal.pcbi.1010793
_version_ 1784865863442104320
author Zhu, Yi-Heng
Zhang, Chengxin
Yu, Dong-Jun
Zhang, Yang
author_facet Zhu, Yi-Heng
Zhang, Chengxin
Yu, Dong-Jun
Zhang, Yang
author_sort Zhu, Yi-Heng
collection PubMed
description Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from protein sequences. The method was systematically tested on 1068 non-redundant benchmarking proteins and 3328 targets from the third Critical Assessment of Protein Function Annotation (CAFA) challenge. Experimental results showed that ATGO achieved a significant increase of the GO prediction accuracy compared to the state-of-the-art approaches in all aspects of molecular function, biological process, and cellular component. Detailed data analyses showed that the major advantage of ATGO lies in the utilization of pre-trained transformer language models which can extract discriminative functional pattern from the feature embeddings. Meanwhile, the proposed triplet network helps enhance the association of functional similarity with feature similarity in the sequence embedding space. In addition, it was found that the combination of the network scores with the complementary homology-based inferences could further improve the accuracy of the predicted models. These results demonstrated a new avenue for high-accuracy deep-learning function prediction that is applicable to large-scale protein function annotations from sequence alone.
format Online
Article
Text
id pubmed-9822105
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-98221052023-01-07 Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction Zhu, Yi-Heng Zhang, Chengxin Yu, Dong-Jun Zhang, Yang PLoS Comput Biol Research Article Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, to predict Gene Ontology (GO) attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from protein sequences. The method was systematically tested on 1068 non-redundant benchmarking proteins and 3328 targets from the third Critical Assessment of Protein Function Annotation (CAFA) challenge. Experimental results showed that ATGO achieved a significant increase of the GO prediction accuracy compared to the state-of-the-art approaches in all aspects of molecular function, biological process, and cellular component. Detailed data analyses showed that the major advantage of ATGO lies in the utilization of pre-trained transformer language models which can extract discriminative functional pattern from the feature embeddings. Meanwhile, the proposed triplet network helps enhance the association of functional similarity with feature similarity in the sequence embedding space. In addition, it was found that the combination of the network scores with the complementary homology-based inferences could further improve the accuracy of the predicted models. These results demonstrated a new avenue for high-accuracy deep-learning function prediction that is applicable to large-scale protein function annotations from sequence alone. Public Library of Science 2022-12-22 /pmc/articles/PMC9822105/ /pubmed/36548439 http://dx.doi.org/10.1371/journal.pcbi.1010793 Text en © 2022 Zhu et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhu, Yi-Heng
Zhang, Chengxin
Yu, Dong-Jun
Zhang, Yang
Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
title Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
title_full Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
title_fullStr Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
title_full_unstemmed Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
title_short Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
title_sort integrating unsupervised language model with triplet neural networks for protein gene ontology prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822105/
https://www.ncbi.nlm.nih.gov/pubmed/36548439
http://dx.doi.org/10.1371/journal.pcbi.1010793
work_keys_str_mv AT zhuyiheng integratingunsupervisedlanguagemodelwithtripletneuralnetworksforproteingeneontologyprediction
AT zhangchengxin integratingunsupervisedlanguagemodelwithtripletneuralnetworksforproteingeneontologyprediction
AT yudongjun integratingunsupervisedlanguagemodelwithtripletneuralnetworksforproteingeneontologyprediction
AT zhangyang integratingunsupervisedlanguagemodelwithtripletneuralnetworksforproteingeneontologyprediction