Cargando…

Use of designed sequences in protein structure recognition

BACKGROUND: Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, stru...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Gayatri, Mudgal, Richa, Srinivasan, Narayanaswamy, Sandhya, Sankaran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5960202/
https://www.ncbi.nlm.nih.gov/pubmed/29776380
http://dx.doi.org/10.1186/s13062-018-0209-6
_version_ 1783324548050976768
author Kumar, Gayatri
Mudgal, Richa
Srinivasan, Narayanaswamy
Sandhya, Sankaran
author_facet Kumar, Gayatri
Mudgal, Richa
Srinivasan, Narayanaswamy
Sandhya, Sankaran
author_sort Kumar, Gayatri
collection PubMed
description BACKGROUND: Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. RESULTS: We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. CONCLUSION: The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as ‘linkers’, where natural linkers between distant proteins are unavailable. REVIEWERS: This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13062-018-0209-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5960202
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59602022018-05-24 Use of designed sequences in protein structure recognition Kumar, Gayatri Mudgal, Richa Srinivasan, Narayanaswamy Sandhya, Sankaran Biol Direct Research BACKGROUND: Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. RESULTS: We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. CONCLUSION: The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as ‘linkers’, where natural linkers between distant proteins are unavailable. REVIEWERS: This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13062-018-0209-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-09 /pmc/articles/PMC5960202/ /pubmed/29776380 http://dx.doi.org/10.1186/s13062-018-0209-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kumar, Gayatri
Mudgal, Richa
Srinivasan, Narayanaswamy
Sandhya, Sankaran
Use of designed sequences in protein structure recognition
title Use of designed sequences in protein structure recognition
title_full Use of designed sequences in protein structure recognition
title_fullStr Use of designed sequences in protein structure recognition
title_full_unstemmed Use of designed sequences in protein structure recognition
title_short Use of designed sequences in protein structure recognition
title_sort use of designed sequences in protein structure recognition
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5960202/
https://www.ncbi.nlm.nih.gov/pubmed/29776380
http://dx.doi.org/10.1186/s13062-018-0209-6
work_keys_str_mv AT kumargayatri useofdesignedsequencesinproteinstructurerecognition
AT mudgalricha useofdesignedsequencesinproteinstructurerecognition
AT srinivasannarayanaswamy useofdesignedsequencesinproteinstructurerecognition
AT sandhyasankaran useofdesignedsequencesinproteinstructurerecognition