Cargando…
Binary classification of protein molecules into intrinsically disordered and ordered segments
BACKGROUND: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199747/ https://www.ncbi.nlm.nih.gov/pubmed/21693062 http://dx.doi.org/10.1186/1472-6807-11-29 |
_version_ | 1782214587375943680 |
---|---|
author | Fukuchi, Satoshi Hosoda, Kazuo Homma, Keiichi Gojobori, Takashi Nishikawa, Ken |
author_facet | Fukuchi, Satoshi Hosoda, Kazuo Homma, Keiichi Gojobori, Takashi Nishikawa, Ken |
author_sort | Fukuchi, Satoshi |
collection | PubMed |
description | BACKGROUND: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome. RESULTS: In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing. CONCLUSIONS: We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT. |
format | Online Article Text |
id | pubmed-3199747 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31997472011-10-24 Binary classification of protein molecules into intrinsically disordered and ordered segments Fukuchi, Satoshi Hosoda, Kazuo Homma, Keiichi Gojobori, Takashi Nishikawa, Ken BMC Struct Biol Research Article BACKGROUND: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome. RESULTS: In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing. CONCLUSIONS: We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT. BioMed Central 2011-06-22 /pmc/articles/PMC3199747/ /pubmed/21693062 http://dx.doi.org/10.1186/1472-6807-11-29 Text en Copyright ©2011 Fukuchi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Fukuchi, Satoshi Hosoda, Kazuo Homma, Keiichi Gojobori, Takashi Nishikawa, Ken Binary classification of protein molecules into intrinsically disordered and ordered segments |
title | Binary classification of protein molecules into intrinsically disordered and ordered segments |
title_full | Binary classification of protein molecules into intrinsically disordered and ordered segments |
title_fullStr | Binary classification of protein molecules into intrinsically disordered and ordered segments |
title_full_unstemmed | Binary classification of protein molecules into intrinsically disordered and ordered segments |
title_short | Binary classification of protein molecules into intrinsically disordered and ordered segments |
title_sort | binary classification of protein molecules into intrinsically disordered and ordered segments |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199747/ https://www.ncbi.nlm.nih.gov/pubmed/21693062 http://dx.doi.org/10.1186/1472-6807-11-29 |
work_keys_str_mv | AT fukuchisatoshi binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments AT hosodakazuo binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments AT hommakeiichi binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments AT gojoboritakashi binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments AT nishikawaken binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments |