Cargando…

CATH: an expanded resource to predict protein function through structure and sequence

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the numb...

Descripción completa

Detalles Bibliográficos
Autores principales: Dawson, Natalie L., Lewis, Tony E., Das, Sayoni, Lees, Jonathan G., Lee, David, Ashford, Paul, Orengo, Christine A., Sillitoe, Ian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210570/
https://www.ncbi.nlm.nih.gov/pubmed/27899584
http://dx.doi.org/10.1093/nar/gkw1098
_version_ 1782490910144069632
author Dawson, Natalie L.
Lewis, Tony E.
Das, Sayoni
Lees, Jonathan G.
Lee, David
Ashford, Paul
Orengo, Christine A.
Sillitoe, Ian
author_facet Dawson, Natalie L.
Lewis, Tony E.
Das, Sayoni
Lees, Jonathan G.
Lee, David
Ashford, Paul
Orengo, Christine A.
Sillitoe, Ian
author_sort Dawson, Natalie L.
collection PubMed
description The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.
format Online
Article
Text
id pubmed-5210570
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-52105702017-01-05 CATH: an expanded resource to predict protein function through structure and sequence Dawson, Natalie L. Lewis, Tony E. Das, Sayoni Lees, Jonathan G. Lee, David Ashford, Paul Orengo, Christine A. Sillitoe, Ian Nucleic Acids Res Database Issue The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site. Oxford University Press 2017-01-04 2016-11-29 /pmc/articles/PMC5210570/ /pubmed/27899584 http://dx.doi.org/10.1093/nar/gkw1098 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Issue
Dawson, Natalie L.
Lewis, Tony E.
Das, Sayoni
Lees, Jonathan G.
Lee, David
Ashford, Paul
Orengo, Christine A.
Sillitoe, Ian
CATH: an expanded resource to predict protein function through structure and sequence
title CATH: an expanded resource to predict protein function through structure and sequence
title_full CATH: an expanded resource to predict protein function through structure and sequence
title_fullStr CATH: an expanded resource to predict protein function through structure and sequence
title_full_unstemmed CATH: an expanded resource to predict protein function through structure and sequence
title_short CATH: an expanded resource to predict protein function through structure and sequence
title_sort cath: an expanded resource to predict protein function through structure and sequence
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210570/
https://www.ncbi.nlm.nih.gov/pubmed/27899584
http://dx.doi.org/10.1093/nar/gkw1098
work_keys_str_mv AT dawsonnataliel cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT lewistonye cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT dassayoni cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT leesjonathang cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT leedavid cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT ashfordpaul cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT orengochristinea cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence
AT sillitoeian cathanexpandedresourcetopredictproteinfunctionthroughstructureandsequence