Cargando…

Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distribu...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Hao-Bo, Ma, Yue, Tuskan, Gerald A., Yang, Xiaohan, Guo, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857298/
https://www.ncbi.nlm.nih.gov/pubmed/29686995
http://dx.doi.org/10.1155/2018/9784161
_version_ 1783307444098695168
author Guo, Hao-Bo
Ma, Yue
Tuskan, Gerald A.
Yang, Xiaohan
Guo, Hong
author_facet Guo, Hao-Bo
Ma, Yue
Tuskan, Gerald A.
Yang, Xiaohan
Guo, Hong
author_sort Guo, Hao-Bo
collection PubMed
description The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level.
format Online
Article
Text
id pubmed-5857298
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-58572982018-04-23 Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins Guo, Hao-Bo Ma, Yue Tuskan, Gerald A. Yang, Xiaohan Guo, Hong Int J Genomics Research Article The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level. Hindawi 2018-03-04 /pmc/articles/PMC5857298/ /pubmed/29686995 http://dx.doi.org/10.1155/2018/9784161 Text en Copyright © 2018 Hao-Bo Guo et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Guo, Hao-Bo
Ma, Yue
Tuskan, Gerald A.
Yang, Xiaohan
Guo, Hong
Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_full Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_fullStr Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_full_unstemmed Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_short Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins
title_sort classification of complete proteomes of different organisms and protein sets based on their protein distributions in terms of some key attributes of proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857298/
https://www.ncbi.nlm.nih.gov/pubmed/29686995
http://dx.doi.org/10.1155/2018/9784161
work_keys_str_mv AT guohaobo classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT mayue classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT tuskangeralda classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT yangxiaohan classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins
AT guohong classificationofcompleteproteomesofdifferentorganismsandproteinsetsbasedontheirproteindistributionsintermsofsomekeyattributesofproteins