Cargando…

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yuanyuan, Wang, Ziqi, Wang, Shudong, Shang, Junliang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493040/ https://www.ncbi.nlm.nih.gov/pubmed/34630534 http://dx.doi.org/10.3389/fgene.2021.744334

_version_	1784579045691752448
author	Zhang, Yuanyuan Wang, Ziqi Wang, Shudong Shang, Junliang
author_facet	Zhang, Yuanyuan Wang, Ziqi Wang, Shudong Shang, Junliang
author_sort	Zhang, Yuanyuan
collection	PubMed
description	The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins.
format	Online Article Text
id	pubmed-8493040
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-84930402021-10-07 Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding Zhang, Yuanyuan Wang, Ziqi Wang, Shudong Shang, Junliang Front Genet Genetics The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins. Frontiers Media S.A. 2021-09-22 /pmc/articles/PMC8493040/ /pubmed/34630534 http://dx.doi.org/10.3389/fgene.2021.744334 Text en Copyright © 2021 Zhang, Wang, Wang and Shang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Zhang, Yuanyuan Wang, Ziqi Wang, Shudong Shang, Junliang Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title	Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_full	Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_fullStr	Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_full_unstemmed	Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_short	Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_sort	comparative analysis of unsupervised protein similarity prediction based on graph embedding
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493040/ https://www.ncbi.nlm.nih.gov/pubmed/34630534 http://dx.doi.org/10.3389/fgene.2021.744334
work_keys_str_mv	AT zhangyuanyuan comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding AT wangziqi comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding AT wangshudong comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding AT shangjunliang comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

Ejemplares similares