Cargando…

SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning

The Structural Classification of Proteins—extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for anal...

Descripción completa

Detalles Bibliográficos
Autores principales: Chandonia, John-Marc, Guan, Lindsey, Lin, Shiangyi, Yu, Changhua, Fox, Naomi K, Brenner, Steven E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728185/
https://www.ncbi.nlm.nih.gov/pubmed/34850923
http://dx.doi.org/10.1093/nar/gkab1054
_version_ 1784626679524622336
author Chandonia, John-Marc
Guan, Lindsey
Lin, Shiangyi
Yu, Changhua
Fox, Naomi K
Brenner, Steven E
author_facet Chandonia, John-Marc
Guan, Lindsey
Lin, Shiangyi
Yu, Changhua
Fox, Naomi K
Brenner, Steven E
author_sort Chandonia, John-Marc
collection PubMed
description The Structural Classification of Proteins—extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries).
format Online
Article
Text
id pubmed-8728185
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87281852022-01-05 SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning Chandonia, John-Marc Guan, Lindsey Lin, Shiangyi Yu, Changhua Fox, Naomi K Brenner, Steven E Nucleic Acids Res Database Issue The Structural Classification of Proteins—extended (SCOPe, https://scop.berkeley.edu) knowledgebase aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst the majority of proteins of known structure, along with resources for analyzing the protein structures and their sequences. Structures from the PDB are divided into domains and classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.08, we have developed search and display tools for analysis of genetic variants we mapped to structures classified in SCOPe. In order to improve the utility of SCOPe to automated methods such as deep learning classifiers that rely on multiple alignment of sequences of homologous proteins, we have introduced new machine-parseable annotations that indicate aberrant structures as well as domains that are distinguished by a smaller repeat unit. We also classified structures from 74 of the largest Pfam families not previously classified in SCOPe, and we improved our algorithm to remove N- and C-terminal cloning, expression and purification sequences from SCOPe domains. SCOPe 2.08-stable classifies 106 976 PDB entries (about 60% of PDB entries). Oxford University Press 2021-12-01 /pmc/articles/PMC8728185/ /pubmed/34850923 http://dx.doi.org/10.1093/nar/gkab1054 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Issue
Chandonia, John-Marc
Guan, Lindsey
Lin, Shiangyi
Yu, Changhua
Fox, Naomi K
Brenner, Steven E
SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
title SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
title_full SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
title_fullStr SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
title_full_unstemmed SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
title_short SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
title_sort scope: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8728185/
https://www.ncbi.nlm.nih.gov/pubmed/34850923
http://dx.doi.org/10.1093/nar/gkab1054
work_keys_str_mv AT chandoniajohnmarc scopeimprovementstothestructuralclassificationofproteinsextendeddatabasetofacilitatevariantinterpretationandmachinelearning
AT guanlindsey scopeimprovementstothestructuralclassificationofproteinsextendeddatabasetofacilitatevariantinterpretationandmachinelearning
AT linshiangyi scopeimprovementstothestructuralclassificationofproteinsextendeddatabasetofacilitatevariantinterpretationandmachinelearning
AT yuchanghua scopeimprovementstothestructuralclassificationofproteinsextendeddatabasetofacilitatevariantinterpretationandmachinelearning
AT foxnaomik scopeimprovementstothestructuralclassificationofproteinsextendeddatabasetofacilitatevariantinterpretationandmachinelearning
AT brennerstevene scopeimprovementstothestructuralclassificationofproteinsextendeddatabasetofacilitatevariantinterpretationandmachinelearning