Cargando…
Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins
Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121005/ https://www.ncbi.nlm.nih.gov/pubmed/37083829 http://dx.doi.org/10.1371/journal.pone.0284274 |
_version_ | 1785029290913431552 |
---|---|
author | Xue, Xiaoli Zhang, Wei Fan, Anjing |
author_facet | Xue, Xiaoli Zhang, Wei Fan, Anjing |
author_sort | Xue, Xiaoli |
collection | PubMed |
description | Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice. |
format | Online Article Text |
id | pubmed-10121005 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-101210052023-04-22 Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins Xue, Xiaoli Zhang, Wei Fan, Anjing PLoS One Research Article Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice. Public Library of Science 2023-04-21 /pmc/articles/PMC10121005/ /pubmed/37083829 http://dx.doi.org/10.1371/journal.pone.0284274 Text en © 2023 Xue et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Xue, Xiaoli Zhang, Wei Fan, Anjing Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
title | Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
title_full | Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
title_fullStr | Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
title_full_unstemmed | Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
title_short | Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
title_sort | comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10121005/ https://www.ncbi.nlm.nih.gov/pubmed/37083829 http://dx.doi.org/10.1371/journal.pone.0284274 |
work_keys_str_mv | AT xuexiaoli comparativeanalysisofgeneontologybasedsemanticsimilaritymeasurementsfortheapplicationofidentifyingessentialproteins AT zhangwei comparativeanalysisofgeneontologybasedsemanticsimilaritymeasurementsfortheapplicationofidentifyingessentialproteins AT fananjing comparativeanalysisofgeneontologybasedsemanticsimilaritymeasurementsfortheapplicationofidentifyingessentialproteins |