Cargando…

Identifying protein function and functional links based on large-scale co-occurrence patterns

OBJECTIVE: The vast majority of known proteins have not been experimentally tested even at the level of measuring their expression, and the function of many proteins remains unknown. In order to decipher protein function and examine functional associations, we developed "Cliquely", a softw...

Descripción completa

Detalles Bibliográficos
Autores principales: Pasternak, Zohar, Chapnik, Noam, Yosef, Roy, Kopelman, Naama M., Jurkevitch, Edouard, Segev, Elad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8893610/
https://www.ncbi.nlm.nih.gov/pubmed/35239724
http://dx.doi.org/10.1371/journal.pone.0264765
_version_ 1784662438783746048
author Pasternak, Zohar
Chapnik, Noam
Yosef, Roy
Kopelman, Naama M.
Jurkevitch, Edouard
Segev, Elad
author_facet Pasternak, Zohar
Chapnik, Noam
Yosef, Roy
Kopelman, Naama M.
Jurkevitch, Edouard
Segev, Elad
author_sort Pasternak, Zohar
collection PubMed
description OBJECTIVE: The vast majority of known proteins have not been experimentally tested even at the level of measuring their expression, and the function of many proteins remains unknown. In order to decipher protein function and examine functional associations, we developed "Cliquely", a software tool based on the exploration of co-occurrence patterns. COMPUTATIONAL MODEL: Using a set of more than 23 million proteins divided into 404,947 orthologous clusters, we explored the co-occurrence graph of 4,742 fully sequenced genomes from the three domains of life. Edge weights in this graph represent co-occurrence probabilities. We use the Bron–Kerbosch algorithm to detect maximal cliques in this graph, fully-connected subgraphs that represent meaningful biological networks from different functional categories. MAIN RESULTS: We demonstrate that Cliquely can successfully identify known networks from various pathways, including nitrogen fixation, glycolysis, methanogenesis, mevalonate and ribosome proteins. Identifying the virulence-associated type III secretion system (T3SS) network, Cliquely also added 13 previously uncharacterized novel proteins to the T3SS network, demonstrating the strength of this approach. Cliquely is freely available and open source. Users can employ the tool to explore co-occurrence networks using a protein of interest and a customizable level of stringency, either for the entire dataset or for a one of the three domains—Archaea, Bacteria, or Eukarya.
format Online
Article
Text
id pubmed-8893610
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-88936102022-03-04 Identifying protein function and functional links based on large-scale co-occurrence patterns Pasternak, Zohar Chapnik, Noam Yosef, Roy Kopelman, Naama M. Jurkevitch, Edouard Segev, Elad PLoS One Research Article OBJECTIVE: The vast majority of known proteins have not been experimentally tested even at the level of measuring their expression, and the function of many proteins remains unknown. In order to decipher protein function and examine functional associations, we developed "Cliquely", a software tool based on the exploration of co-occurrence patterns. COMPUTATIONAL MODEL: Using a set of more than 23 million proteins divided into 404,947 orthologous clusters, we explored the co-occurrence graph of 4,742 fully sequenced genomes from the three domains of life. Edge weights in this graph represent co-occurrence probabilities. We use the Bron–Kerbosch algorithm to detect maximal cliques in this graph, fully-connected subgraphs that represent meaningful biological networks from different functional categories. MAIN RESULTS: We demonstrate that Cliquely can successfully identify known networks from various pathways, including nitrogen fixation, glycolysis, methanogenesis, mevalonate and ribosome proteins. Identifying the virulence-associated type III secretion system (T3SS) network, Cliquely also added 13 previously uncharacterized novel proteins to the T3SS network, demonstrating the strength of this approach. Cliquely is freely available and open source. Users can employ the tool to explore co-occurrence networks using a protein of interest and a customizable level of stringency, either for the entire dataset or for a one of the three domains—Archaea, Bacteria, or Eukarya. Public Library of Science 2022-03-03 /pmc/articles/PMC8893610/ /pubmed/35239724 http://dx.doi.org/10.1371/journal.pone.0264765 Text en © 2022 Pasternak et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pasternak, Zohar
Chapnik, Noam
Yosef, Roy
Kopelman, Naama M.
Jurkevitch, Edouard
Segev, Elad
Identifying protein function and functional links based on large-scale co-occurrence patterns
title Identifying protein function and functional links based on large-scale co-occurrence patterns
title_full Identifying protein function and functional links based on large-scale co-occurrence patterns
title_fullStr Identifying protein function and functional links based on large-scale co-occurrence patterns
title_full_unstemmed Identifying protein function and functional links based on large-scale co-occurrence patterns
title_short Identifying protein function and functional links based on large-scale co-occurrence patterns
title_sort identifying protein function and functional links based on large-scale co-occurrence patterns
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8893610/
https://www.ncbi.nlm.nih.gov/pubmed/35239724
http://dx.doi.org/10.1371/journal.pone.0264765
work_keys_str_mv AT pasternakzohar identifyingproteinfunctionandfunctionallinksbasedonlargescalecooccurrencepatterns
AT chapniknoam identifyingproteinfunctionandfunctionallinksbasedonlargescalecooccurrencepatterns
AT yosefroy identifyingproteinfunctionandfunctionallinksbasedonlargescalecooccurrencepatterns
AT kopelmannaamam identifyingproteinfunctionandfunctionallinksbasedonlargescalecooccurrencepatterns
AT jurkevitchedouard identifyingproteinfunctionandfunctionallinksbasedonlargescalecooccurrencepatterns
AT segevelad identifyingproteinfunctionandfunctionallinksbasedonlargescalecooccurrencepatterns