Cargando…

Binary classification of protein molecules into intrinsically disordered and ordered segments

BACKGROUND: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in...

Descripción completa

Detalles Bibliográficos
Autores principales: Fukuchi, Satoshi, Hosoda, Kazuo, Homma, Keiichi, Gojobori, Takashi, Nishikawa, Ken
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199747/
https://www.ncbi.nlm.nih.gov/pubmed/21693062
http://dx.doi.org/10.1186/1472-6807-11-29
_version_ 1782214587375943680
author Fukuchi, Satoshi
Hosoda, Kazuo
Homma, Keiichi
Gojobori, Takashi
Nishikawa, Ken
author_facet Fukuchi, Satoshi
Hosoda, Kazuo
Homma, Keiichi
Gojobori, Takashi
Nishikawa, Ken
author_sort Fukuchi, Satoshi
collection PubMed
description BACKGROUND: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome. RESULTS: In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing. CONCLUSIONS: We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT.
format Online
Article
Text
id pubmed-3199747
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31997472011-10-24 Binary classification of protein molecules into intrinsically disordered and ordered segments Fukuchi, Satoshi Hosoda, Kazuo Homma, Keiichi Gojobori, Takashi Nishikawa, Ken BMC Struct Biol Research Article BACKGROUND: Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome. RESULTS: In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing. CONCLUSIONS: We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT. BioMed Central 2011-06-22 /pmc/articles/PMC3199747/ /pubmed/21693062 http://dx.doi.org/10.1186/1472-6807-11-29 Text en Copyright ©2011 Fukuchi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Fukuchi, Satoshi
Hosoda, Kazuo
Homma, Keiichi
Gojobori, Takashi
Nishikawa, Ken
Binary classification of protein molecules into intrinsically disordered and ordered segments
title Binary classification of protein molecules into intrinsically disordered and ordered segments
title_full Binary classification of protein molecules into intrinsically disordered and ordered segments
title_fullStr Binary classification of protein molecules into intrinsically disordered and ordered segments
title_full_unstemmed Binary classification of protein molecules into intrinsically disordered and ordered segments
title_short Binary classification of protein molecules into intrinsically disordered and ordered segments
title_sort binary classification of protein molecules into intrinsically disordered and ordered segments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199747/
https://www.ncbi.nlm.nih.gov/pubmed/21693062
http://dx.doi.org/10.1186/1472-6807-11-29
work_keys_str_mv AT fukuchisatoshi binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments
AT hosodakazuo binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments
AT hommakeiichi binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments
AT gojoboritakashi binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments
AT nishikawaken binaryclassificationofproteinmoleculesintointrinsicallydisorderedandorderedsegments