Cargando…

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardi...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuanyuan, Wang, Ziqi, Wang, Shudong, Shang, Junliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493040/
https://www.ncbi.nlm.nih.gov/pubmed/34630534
http://dx.doi.org/10.3389/fgene.2021.744334
_version_ 1784579045691752448
author Zhang, Yuanyuan
Wang, Ziqi
Wang, Shudong
Shang, Junliang
author_facet Zhang, Yuanyuan
Wang, Ziqi
Wang, Shudong
Shang, Junliang
author_sort Zhang, Yuanyuan
collection PubMed
description The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins.
format Online
Article
Text
id pubmed-8493040
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-84930402021-10-07 Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding Zhang, Yuanyuan Wang, Ziqi Wang, Shudong Shang, Junliang Front Genet Genetics The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins. Frontiers Media S.A. 2021-09-22 /pmc/articles/PMC8493040/ /pubmed/34630534 http://dx.doi.org/10.3389/fgene.2021.744334 Text en Copyright © 2021 Zhang, Wang, Wang and Shang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhang, Yuanyuan
Wang, Ziqi
Wang, Shudong
Shang, Junliang
Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_full Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_fullStr Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_full_unstemmed Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_short Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
title_sort comparative analysis of unsupervised protein similarity prediction based on graph embedding
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493040/
https://www.ncbi.nlm.nih.gov/pubmed/34630534
http://dx.doi.org/10.3389/fgene.2021.744334
work_keys_str_mv AT zhangyuanyuan comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding
AT wangziqi comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding
AT wangshudong comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding
AT shangjunliang comparativeanalysisofunsupervisedproteinsimilaritypredictionbasedongraphembedding