Cargando…
A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny
Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated th...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3063783/ https://www.ncbi.nlm.nih.gov/pubmed/21455299 http://dx.doi.org/10.1371/journal.pone.0017906 |
_version_ | 1782200830693212160 |
---|---|
author | Wang, Zheng Zhang, Xue-Cheng Le, Mi Ha Xu, Dong Stacey, Gary Cheng, Jianlin |
author_facet | Wang, Zheng Zhang, Xue-Cheng Le, Mi Ha Xu, Dong Stacey, Gary Cheng, Jianlin |
author_sort | Wang, Zheng |
collection | PubMed |
description | Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species. |
format | Text |
id | pubmed-3063783 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-30637832011-03-31 A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny Wang, Zheng Zhang, Xue-Cheng Le, Mi Ha Xu, Dong Stacey, Gary Cheng, Jianlin PLoS One Research Article Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species. Public Library of Science 2011-03-24 /pmc/articles/PMC3063783/ /pubmed/21455299 http://dx.doi.org/10.1371/journal.pone.0017906 Text en Wang et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Wang, Zheng Zhang, Xue-Cheng Le, Mi Ha Xu, Dong Stacey, Gary Cheng, Jianlin A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny |
title | A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny |
title_full | A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny |
title_fullStr | A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny |
title_full_unstemmed | A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny |
title_short | A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny |
title_sort | protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3063783/ https://www.ncbi.nlm.nih.gov/pubmed/21455299 http://dx.doi.org/10.1371/journal.pone.0017906 |
work_keys_str_mv | AT wangzheng aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT zhangxuecheng aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT lemiha aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT xudong aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT staceygary aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT chengjianlin aproteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT wangzheng proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT zhangxuecheng proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT lemiha proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT xudong proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT staceygary proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny AT chengjianlin proteindomaincooccurrencenetworkapproachforpredictingproteinfunctionandinferringspeciesphylogeny |