Cargando…

GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction

Protein Language Models (PLMs) are shown to be capable of learning sequence representations useful for various prediction tasks, from subcellular localization, evolutionary relationships, family membership, and more. They have yet to be demonstrated useful for protein function prediction. In particu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kabir, Anowarul, Shehu, Amarda
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9687818/ https://www.ncbi.nlm.nih.gov/pubmed/36421723 http://dx.doi.org/10.3390/biom12111709

_version_	1784836107604590592
author	Kabir, Anowarul Shehu, Amarda
author_facet	Kabir, Anowarul Shehu, Amarda
author_sort	Kabir, Anowarul
collection	PubMed
description	Protein Language Models (PLMs) are shown to be capable of learning sequence representations useful for various prediction tasks, from subcellular localization, evolutionary relationships, family membership, and more. They have yet to be demonstrated useful for protein function prediction. In particular, the problem of automatic annotation of proteins under the Gene Ontology (GO) framework remains open. This paper makes two key contributions. It debuts a novel method that leverages the transformer architecture in two ways. A sequence transformer encodes protein sequences in a task-agnostic feature space. A graph transformer learns a representation of GO terms while respecting their hierarchical relationships. The learned sequence and GO terms representations are combined and utilized for multi-label classification, with the labels corresponding to GO terms. The method is shown superior over recent representative GO prediction methods. The second major contribution in this paper is a deep investigation of different ways of constructing training and testing datasets. The paper shows that existing approaches under- or over-estimate the generalization power of a model. A novel approach is proposed to address these issues, resulting in a new benchmark dataset to rigorously evaluate and compare methods and advance the state-of-the-art.
format	Online Article Text
id	pubmed-9687818
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96878182022-11-25 GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction Kabir, Anowarul Shehu, Amarda Biomolecules Article Protein Language Models (PLMs) are shown to be capable of learning sequence representations useful for various prediction tasks, from subcellular localization, evolutionary relationships, family membership, and more. They have yet to be demonstrated useful for protein function prediction. In particular, the problem of automatic annotation of proteins under the Gene Ontology (GO) framework remains open. This paper makes two key contributions. It debuts a novel method that leverages the transformer architecture in two ways. A sequence transformer encodes protein sequences in a task-agnostic feature space. A graph transformer learns a representation of GO terms while respecting their hierarchical relationships. The learned sequence and GO terms representations are combined and utilized for multi-label classification, with the labels corresponding to GO terms. The method is shown superior over recent representative GO prediction methods. The second major contribution in this paper is a deep investigation of different ways of constructing training and testing datasets. The paper shows that existing approaches under- or over-estimate the generalization power of a model. A novel approach is proposed to address these issues, resulting in a new benchmark dataset to rigorously evaluate and compare methods and advance the state-of-the-art. MDPI 2022-11-18 /pmc/articles/PMC9687818/ /pubmed/36421723 http://dx.doi.org/10.3390/biom12111709 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kabir, Anowarul Shehu, Amarda GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction
title	GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction
title_full	GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction
title_fullStr	GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction
title_full_unstemmed	GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction
title_short	GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction
title_sort	goproformer: a multi-modal transformer method for gene ontology protein function prediction
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9687818/ https://www.ncbi.nlm.nih.gov/pubmed/36421723 http://dx.doi.org/10.3390/biom12111709
work_keys_str_mv	AT kabiranowarul goproformeramultimodaltransformermethodforgeneontologyproteinfunctionprediction AT shehuamarda goproformeramultimodaltransformermethodforgeneontologyproteinfunctionprediction

GOProFormer: A Multi-Modal Transformer Method for Gene Ontology Protein Function Prediction

Ejemplares similares