Cargando…

Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors

BACKGROUND: In addition to structural domains, most eukaryotic proteins possess intrinsically disordered (ID) regions. Although ID regions often play important functional roles, their accurate identification is difficult. As human transcription factors (TFs) constitute a typical group of proteins wi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fukuchi, Satoshi, Homma, Keiichi, Minezaki, Yoshiaki, Gojobori, Takashi, Nishikawa, Ken
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687452/ https://www.ncbi.nlm.nih.gov/pubmed/19402914 http://dx.doi.org/10.1186/1472-6807-9-26

_version_	1782167534944911360
author	Fukuchi, Satoshi Homma, Keiichi Minezaki, Yoshiaki Gojobori, Takashi Nishikawa, Ken
author_facet	Fukuchi, Satoshi Homma, Keiichi Minezaki, Yoshiaki Gojobori, Takashi Nishikawa, Ken
author_sort	Fukuchi, Satoshi
collection	PubMed
description	BACKGROUND: In addition to structural domains, most eukaryotic proteins possess intrinsically disordered (ID) regions. Although ID regions often play important functional roles, their accurate identification is difficult. As human transcription factors (TFs) constitute a typical group of proteins with long ID regions, we regarded them as a model of all proteins and attempted to accurately classify TFs into structural domains and ID regions. Although an extremely high fraction of ID regions besides DNA binding and/or other domains was detected in human TFs in our previous investigation, 20% of the residues were left unassigned. In this report, we exploit the generally higher sequence divergence in ID regions than in structural regions to completely divide proteins into structural domains and ID regions. RESULTS: The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration. The system was found to be highly accurate: its application to a set of proteins with experimentally verified ID regions had an error rate as low as 2%. Application of this system to human TFs (401 proteins) showed that 38% of the residues were in structural domains, while 62% were in ID regions. The preponderance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions. The method also revealed that 4.0% and 11.8% of the total length in human and E. coli TFs, respectively, are comprised of structural domains whose structures have not been determined. CONCLUSION: The present system verifies that sequence divergence including information of unaligned regions is a good indicator of ID regions. The system for the first time estimates the complete fractioning of structured/un-structured regions in human TFs, also revealing structural domains without homology to known structures. These predicted novel structural domains are good targets of structural genomics. When applied to other proteins, the system is expected to uncover more novel structural domains.
format	Text
id	pubmed-2687452
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26874522009-05-28 Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors Fukuchi, Satoshi Homma, Keiichi Minezaki, Yoshiaki Gojobori, Takashi Nishikawa, Ken BMC Struct Biol Research Article BACKGROUND: In addition to structural domains, most eukaryotic proteins possess intrinsically disordered (ID) regions. Although ID regions often play important functional roles, their accurate identification is difficult. As human transcription factors (TFs) constitute a typical group of proteins with long ID regions, we regarded them as a model of all proteins and attempted to accurately classify TFs into structural domains and ID regions. Although an extremely high fraction of ID regions besides DNA binding and/or other domains was detected in human TFs in our previous investigation, 20% of the residues were left unassigned. In this report, we exploit the generally higher sequence divergence in ID regions than in structural regions to completely divide proteins into structural domains and ID regions. RESULTS: The new dichotomic system first identifies domains of known structures, followed by assignment of structural domains and ID regions with a combination of pre-existing tools and a newly developed program based on sequence divergence, taking un-aligned regions into consideration. The system was found to be highly accurate: its application to a set of proteins with experimentally verified ID regions had an error rate as low as 2%. Application of this system to human TFs (401 proteins) showed that 38% of the residues were in structural domains, while 62% were in ID regions. The preponderance of ID regions makes a sharp contrast to TFs of Escherichia coli (229 proteins), in which only 5% fell in ID regions. The method also revealed that 4.0% and 11.8% of the total length in human and E. coli TFs, respectively, are comprised of structural domains whose structures have not been determined. CONCLUSION: The present system verifies that sequence divergence including information of unaligned regions is a good indicator of ID regions. The system for the first time estimates the complete fractioning of structured/un-structured regions in human TFs, also revealing structural domains without homology to known structures. These predicted novel structural domains are good targets of structural genomics. When applied to other proteins, the system is expected to uncover more novel structural domains. BioMed Central 2009-04-30 /pmc/articles/PMC2687452/ /pubmed/19402914 http://dx.doi.org/10.1186/1472-6807-9-26 Text en Copyright © 2009 Fukuchi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Fukuchi, Satoshi Homma, Keiichi Minezaki, Yoshiaki Gojobori, Takashi Nishikawa, Ken Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
title	Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
title_full	Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
title_fullStr	Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
title_full_unstemmed	Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
title_short	Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
title_sort	development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687452/ https://www.ncbi.nlm.nih.gov/pubmed/19402914 http://dx.doi.org/10.1186/1472-6807-9-26
work_keys_str_mv	AT fukuchisatoshi developmentofanaccurateclassificationsystemofproteinsintostructuredandunstructuredregionsthatuncoversnovelstructuraldomainsitsapplicationtohumantranscriptionfactors AT hommakeiichi developmentofanaccurateclassificationsystemofproteinsintostructuredandunstructuredregionsthatuncoversnovelstructuraldomainsitsapplicationtohumantranscriptionfactors AT minezakiyoshiaki developmentofanaccurateclassificationsystemofproteinsintostructuredandunstructuredregionsthatuncoversnovelstructuraldomainsitsapplicationtohumantranscriptionfactors AT gojoboritakashi developmentofanaccurateclassificationsystemofproteinsintostructuredandunstructuredregionsthatuncoversnovelstructuraldomainsitsapplicationtohumantranscriptionfactors AT nishikawaken developmentofanaccurateclassificationsystemofproteinsintostructuredandunstructuredregionsthatuncoversnovelstructuraldomainsitsapplicationtohumantranscriptionfactors

Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors

Ejemplares similares