Cargando…

Comprehensive Repertoire of Foldable Regions within Whole Genomes

In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophob...

Descripción completa

Detalles Bibliográficos
Autores principales: Faure, Guilhem, Callebaut, Isabelle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812050/
https://www.ncbi.nlm.nih.gov/pubmed/24204229
http://dx.doi.org/10.1371/journal.pcbi.1003280
_version_ 1782288924411953152
author Faure, Guilhem
Callebaut, Isabelle
author_facet Faure, Guilhem
Callebaut, Isabelle
author_sort Faure, Guilhem
collection PubMed
description In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA). These hydrophobic clusters mainly correspond to regular secondary structures, which together form structured or foldable regions. Genome-wide analyses revealed that SEG-HCA is opposite of disorder predictors, both addressing distinct structural states. Interestingly, there is however an overlap between the two predictions, including small segments of disordered sequences, which undergo coupled folding and binding. SEG-HCA thus gives access to these specific domains, which are generally poorly represented in domain databases. Comparison of the whole set of SEG-HCA predictions with the Conserved Domain Database (CDD) also highlighted a wide proportion of predicted large (length >50 amino acids) segments, which are CDD orphan. These orphan sequences may either correspond to highly divergent members of already known families or belong to new families of domains. Their comprehensive description thus opens new avenues to investigate new functional and/or structural features, which remained so far uncovered. Altogether, the data described here provide new insights into the protein architecture and organization throughout the three kingdoms of life.
format Online
Article
Text
id pubmed-3812050
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38120502013-11-07 Comprehensive Repertoire of Foldable Regions within Whole Genomes Faure, Guilhem Callebaut, Isabelle PLoS Comput Biol Research Article In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA). These hydrophobic clusters mainly correspond to regular secondary structures, which together form structured or foldable regions. Genome-wide analyses revealed that SEG-HCA is opposite of disorder predictors, both addressing distinct structural states. Interestingly, there is however an overlap between the two predictions, including small segments of disordered sequences, which undergo coupled folding and binding. SEG-HCA thus gives access to these specific domains, which are generally poorly represented in domain databases. Comparison of the whole set of SEG-HCA predictions with the Conserved Domain Database (CDD) also highlighted a wide proportion of predicted large (length >50 amino acids) segments, which are CDD orphan. These orphan sequences may either correspond to highly divergent members of already known families or belong to new families of domains. Their comprehensive description thus opens new avenues to investigate new functional and/or structural features, which remained so far uncovered. Altogether, the data described here provide new insights into the protein architecture and organization throughout the three kingdoms of life. Public Library of Science 2013-10-24 /pmc/articles/PMC3812050/ /pubmed/24204229 http://dx.doi.org/10.1371/journal.pcbi.1003280 Text en © 2013 Faure, Callebaut http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Faure, Guilhem
Callebaut, Isabelle
Comprehensive Repertoire of Foldable Regions within Whole Genomes
title Comprehensive Repertoire of Foldable Regions within Whole Genomes
title_full Comprehensive Repertoire of Foldable Regions within Whole Genomes
title_fullStr Comprehensive Repertoire of Foldable Regions within Whole Genomes
title_full_unstemmed Comprehensive Repertoire of Foldable Regions within Whole Genomes
title_short Comprehensive Repertoire of Foldable Regions within Whole Genomes
title_sort comprehensive repertoire of foldable regions within whole genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812050/
https://www.ncbi.nlm.nih.gov/pubmed/24204229
http://dx.doi.org/10.1371/journal.pcbi.1003280
work_keys_str_mv AT faureguilhem comprehensiverepertoireoffoldableregionswithinwholegenomes
AT callebautisabelle comprehensiverepertoireoffoldableregionswithinwholegenomes