Cargando…

Dynamics of domain coverage of the protein sequence universe

BACKGROUND: The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current p...

Descripción completa

Detalles Bibliográficos
Autores principales: Rekapalli, Bhanu, Wuichet, Kristin, Peterson, Gregory D, Zhulin, Igor B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3557196/
https://www.ncbi.nlm.nih.gov/pubmed/23157439
http://dx.doi.org/10.1186/1471-2164-13-634
_version_ 1782257281528758272
author Rekapalli, Bhanu
Wuichet, Kristin
Peterson, Gregory D
Zhulin, Igor B
author_facet Rekapalli, Bhanu
Wuichet, Kristin
Peterson, Gregory D
Zhulin, Igor B
author_sort Rekapalli, Bhanu
collection PubMed
description BACKGROUND: The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. RESULTS: Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. CONCLUSIONS: Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.
format Online
Article
Text
id pubmed-3557196
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35571962013-01-31 Dynamics of domain coverage of the protein sequence universe Rekapalli, Bhanu Wuichet, Kristin Peterson, Gregory D Zhulin, Igor B BMC Genomics Research Article BACKGROUND: The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. RESULTS: Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. CONCLUSIONS: Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. BioMed Central 2012-11-16 /pmc/articles/PMC3557196/ /pubmed/23157439 http://dx.doi.org/10.1186/1471-2164-13-634 Text en Copyright ©2012 Rekapalli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rekapalli, Bhanu
Wuichet, Kristin
Peterson, Gregory D
Zhulin, Igor B
Dynamics of domain coverage of the protein sequence universe
title Dynamics of domain coverage of the protein sequence universe
title_full Dynamics of domain coverage of the protein sequence universe
title_fullStr Dynamics of domain coverage of the protein sequence universe
title_full_unstemmed Dynamics of domain coverage of the protein sequence universe
title_short Dynamics of domain coverage of the protein sequence universe
title_sort dynamics of domain coverage of the protein sequence universe
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3557196/
https://www.ncbi.nlm.nih.gov/pubmed/23157439
http://dx.doi.org/10.1186/1471-2164-13-634
work_keys_str_mv AT rekapallibhanu dynamicsofdomaincoverageoftheproteinsequenceuniverse
AT wuichetkristin dynamicsofdomaincoverageoftheproteinsequenceuniverse
AT petersongregoryd dynamicsofdomaincoverageoftheproteinsequenceuniverse
AT zhulinigorb dynamicsofdomaincoverageoftheproteinsequenceuniverse