Cargando…

Exploration of Uncharted Regions of the Protein Universe

The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such fami...

Descripción completa

Detalles Bibliográficos
Autores principales: Jaroszewski, Lukasz, Li, Zhanwen, Krishna, S. Sri, Bakolitsa, Constantina, Wooley, John, Deacon, Ashley M., Wilson, Ian A., Godzik, Adam
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2744874/
https://www.ncbi.nlm.nih.gov/pubmed/19787035
http://dx.doi.org/10.1371/journal.pbio.1000205
_version_ 1782171933148708864
author Jaroszewski, Lukasz
Li, Zhanwen
Krishna, S. Sri
Bakolitsa, Constantina
Wooley, John
Deacon, Ashley M.
Wilson, Ian A.
Godzik, Adam
author_facet Jaroszewski, Lukasz
Li, Zhanwen
Krishna, S. Sri
Bakolitsa, Constantina
Wooley, John
Deacon, Ashley M.
Wilson, Ian A.
Godzik, Adam
author_sort Jaroszewski, Lukasz
collection PubMed
description The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.
format Text
id pubmed-2744874
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27448742009-09-29 Exploration of Uncharted Regions of the Protein Universe Jaroszewski, Lukasz Li, Zhanwen Krishna, S. Sri Bakolitsa, Constantina Wooley, John Deacon, Ashley M. Wilson, Ian A. Godzik, Adam PLoS Biol Research Article The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies. Public Library of Science 2009-09-29 /pmc/articles/PMC2744874/ /pubmed/19787035 http://dx.doi.org/10.1371/journal.pbio.1000205 Text en Jaroszewski et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Jaroszewski, Lukasz
Li, Zhanwen
Krishna, S. Sri
Bakolitsa, Constantina
Wooley, John
Deacon, Ashley M.
Wilson, Ian A.
Godzik, Adam
Exploration of Uncharted Regions of the Protein Universe
title Exploration of Uncharted Regions of the Protein Universe
title_full Exploration of Uncharted Regions of the Protein Universe
title_fullStr Exploration of Uncharted Regions of the Protein Universe
title_full_unstemmed Exploration of Uncharted Regions of the Protein Universe
title_short Exploration of Uncharted Regions of the Protein Universe
title_sort exploration of uncharted regions of the protein universe
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2744874/
https://www.ncbi.nlm.nih.gov/pubmed/19787035
http://dx.doi.org/10.1371/journal.pbio.1000205
work_keys_str_mv AT jaroszewskilukasz explorationofunchartedregionsoftheproteinuniverse
AT lizhanwen explorationofunchartedregionsoftheproteinuniverse
AT krishnassri explorationofunchartedregionsoftheproteinuniverse
AT bakolitsaconstantina explorationofunchartedregionsoftheproteinuniverse
AT wooleyjohn explorationofunchartedregionsoftheproteinuniverse
AT deaconashleym explorationofunchartedregionsoftheproteinuniverse
AT wilsoniana explorationofunchartedregionsoftheproteinuniverse
AT godzikadam explorationofunchartedregionsoftheproteinuniverse