Cargando…

PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe

Sequence alignments are models that capture the structural, functional and evolutionary relationships between proteins. Structure-guided sequence alignments are helpful in the case of distantly related proteins with poor sequence identity, thus rendering routine sequence alignment methods ineffectiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharyya, Teerna, Nayak, Soumya, Goswami, Smit, Gadiyaram, Vasundhara, Mathew, Oommen K, Sowdhamini, Ramanathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216583/
https://www.ncbi.nlm.nih.gov/pubmed/35411388
http://dx.doi.org/10.1093/database/baac025
_version_ 1784731457649901568
author Bhattacharyya, Teerna
Nayak, Soumya
Goswami, Smit
Gadiyaram, Vasundhara
Mathew, Oommen K
Sowdhamini, Ramanathan
author_facet Bhattacharyya, Teerna
Nayak, Soumya
Goswami, Smit
Gadiyaram, Vasundhara
Mathew, Oommen K
Sowdhamini, Ramanathan
author_sort Bhattacharyya, Teerna
collection PubMed
description Sequence alignments are models that capture the structural, functional and evolutionary relationships between proteins. Structure-guided sequence alignments are helpful in the case of distantly related proteins with poor sequence identity, thus rendering routine sequence alignment methods ineffective. Protein Alignment organized as Structural Superfamilies or PASS2 database provides such sequence alignments of protein domains within a superfamily as per the Structural Classification of Proteins extended (SCOPe) database. The current update of PASS2 (i.e. PASS2.7) is following the latest release of SCOPe (2.07) and we provide data for 14 323 protein domains that are <40% identical and are organized into 2024 superfamilies. Several useful features derived from the alignments, such as conserved secondary structural motifs, HMMs and residues conserved across the superfamily, are also reported. Protein domains that are deviant from the rest of the members of a superfamily may compromise the quality of the alignment, and we found this to be the case in ∼7% of the total superfamilies we considered. To improve the alignment by objectively identifying such ‘outliers’, in this update, we have used a k-means-based unsupervised machine learning method for clustering superfamily members, where features provided were length of domains aligned, C(α)-RMSD derived from the rigid-body superposition of all members and gaps contributed to the alignment by each domain. In a few cases, we have split the superfamily as per the clusters predicted and provided complete data for each cluster. A new feature included in this update is absolutely conserved interactions (ACIs) between residue backbones and side chains, which are obtained by aligning protein structure networks using structure-guided sequence alignments of superfamilies. ACIs provide valuable information about functionally important residues and the structure–function relationships of proteins. The ACIs and the corresponding conserved networks for backbone and sidechain have been marked on the superimposed structure separately. DATABASE URL: The updated version of the PASS2 database is available at http://caps.ncbs.res.in/pass2/.
format Online
Article
Text
id pubmed-9216583
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92165832022-06-23 PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe Bhattacharyya, Teerna Nayak, Soumya Goswami, Smit Gadiyaram, Vasundhara Mathew, Oommen K Sowdhamini, Ramanathan Database (Oxford) Database Update Sequence alignments are models that capture the structural, functional and evolutionary relationships between proteins. Structure-guided sequence alignments are helpful in the case of distantly related proteins with poor sequence identity, thus rendering routine sequence alignment methods ineffective. Protein Alignment organized as Structural Superfamilies or PASS2 database provides such sequence alignments of protein domains within a superfamily as per the Structural Classification of Proteins extended (SCOPe) database. The current update of PASS2 (i.e. PASS2.7) is following the latest release of SCOPe (2.07) and we provide data for 14 323 protein domains that are <40% identical and are organized into 2024 superfamilies. Several useful features derived from the alignments, such as conserved secondary structural motifs, HMMs and residues conserved across the superfamily, are also reported. Protein domains that are deviant from the rest of the members of a superfamily may compromise the quality of the alignment, and we found this to be the case in ∼7% of the total superfamilies we considered. To improve the alignment by objectively identifying such ‘outliers’, in this update, we have used a k-means-based unsupervised machine learning method for clustering superfamily members, where features provided were length of domains aligned, C(α)-RMSD derived from the rigid-body superposition of all members and gaps contributed to the alignment by each domain. In a few cases, we have split the superfamily as per the clusters predicted and provided complete data for each cluster. A new feature included in this update is absolutely conserved interactions (ACIs) between residue backbones and side chains, which are obtained by aligning protein structure networks using structure-guided sequence alignments of superfamilies. ACIs provide valuable information about functionally important residues and the structure–function relationships of proteins. The ACIs and the corresponding conserved networks for backbone and sidechain have been marked on the superimposed structure separately. DATABASE URL: The updated version of the PASS2 database is available at http://caps.ncbs.res.in/pass2/. Oxford University Press 2022-04-12 /pmc/articles/PMC9216583/ /pubmed/35411388 http://dx.doi.org/10.1093/database/baac025 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Update
Bhattacharyya, Teerna
Nayak, Soumya
Goswami, Smit
Gadiyaram, Vasundhara
Mathew, Oommen K
Sowdhamini, Ramanathan
PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe
title PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe
title_full PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe
title_fullStr PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe
title_full_unstemmed PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe
title_short PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe
title_sort pass2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from scope
topic Database Update
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216583/
https://www.ncbi.nlm.nih.gov/pubmed/35411388
http://dx.doi.org/10.1093/database/baac025
work_keys_str_mv AT bhattacharyyateerna pass27adatabasecontainingstructurebasedsequencealignmentsandassociatedfeaturesofproteindomainsuperfamiliesfromscope
AT nayaksoumya pass27adatabasecontainingstructurebasedsequencealignmentsandassociatedfeaturesofproteindomainsuperfamiliesfromscope
AT goswamismit pass27adatabasecontainingstructurebasedsequencealignmentsandassociatedfeaturesofproteindomainsuperfamiliesfromscope
AT gadiyaramvasundhara pass27adatabasecontainingstructurebasedsequencealignmentsandassociatedfeaturesofproteindomainsuperfamiliesfromscope
AT mathewoommenk pass27adatabasecontainingstructurebasedsequencealignmentsandassociatedfeaturesofproteindomainsuperfamiliesfromscope
AT sowdhaminiramanathan pass27adatabasecontainingstructurebasedsequencealignmentsandassociatedfeaturesofproteindomainsuperfamiliesfromscope