Cargando…

A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny

Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated th...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zheng, Zhang, Xue-Cheng, Le, Mi Ha, Xu, Dong, Stacey, Gary, Cheng, Jianlin
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3063783/
https://www.ncbi.nlm.nih.gov/pubmed/21455299
http://dx.doi.org/10.1371/journal.pone.0017906
_version_ 1782200830693212160
author Wang, Zheng
Zhang, Xue-Cheng
Le, Mi Ha
Xu, Dong
Stacey, Gary
Cheng, Jianlin
author_facet Wang, Zheng
Zhang, Xue-Cheng
Le, Mi Ha
Xu, Dong
Stacey, Gary
Cheng, Jianlin
author_sort Wang, Zheng
collection PubMed
description Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.
format Text
id pubmed-3063783
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30637832011-03-31 A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny Wang, Zheng Zhang, Xue-Cheng Le, Mi Ha Xu, Dong Stacey, Gary Cheng, Jianlin PLoS One Research Article Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species. Public Library of Science 2011-03-24 /pmc/articles/PMC3063783/ /pubmed/21455299 http://dx.doi.org/10.1371/journal.pone.0017906 Text en Wang et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wang, Zheng
Zhang, Xue-Cheng
Le, Mi Ha
Xu, Dong
Stacey, Gary
Cheng, Jianlin
A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
title A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
title_full A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
title_fullStr A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
title_full_unstemmed A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
title_short A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
title_sort protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3063783/
https://www.ncbi.nlm.nih.gov/pubmed/21455299
http://dx.doi.org/10.1371/journal.pone.0017906
work_keys_str_mv AT wangzheng aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT zhangxuecheng aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT lemiha aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT xudong aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT staceygary aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT chengjianlin aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT wangzheng proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT zhangxuecheng proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT lemiha proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT xudong proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT staceygary proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny
AT chengjianlin proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny