Cargando…

Nh3D: A reference dataset of non-homologous protein structures

BACKGROUND: The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfil minimal requirements regarding the experimental quality of the structures it contains. Howe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thiruv, B, Quon, G, Saldanha, SA, Steipe, B
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Database
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182382/ https://www.ncbi.nlm.nih.gov/pubmed/16011803 http://dx.doi.org/10.1186/1472-6807-5-12

_version_	1782124665461800960
author	Thiruv, B Quon, G Saldanha, SA Steipe, B
author_facet	Thiruv, B Quon, G Saldanha, SA Steipe, B
author_sort	Thiruv, B
collection	PubMed
description	BACKGROUND: The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfil minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. Here we provide a reference dataset of non-homologous protein domains, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry. The dataset is based on domains at the Topology level of the CATH database which hierarchically classifies all protein structures. It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. The compilation of Nh3D is fully scripted. RESULTS: The current Nh3D list contains 570 domains with a total of 90780 residues. It covers more than 70% of folds at the Topology level of the CATH database and represents more than 90% of the structures in the PDB that have been classified by CATH. We observe that even though all protein pairs are structurally dissimilar, some pairwise sequence identities after global alignment are greater than 30%. CONCLUSION: Nh3D is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB. Regularly updated versions of Nh3D and the corresponding PDB-formatted coordinate sets are accessible from our Web site .
format	Text
id	pubmed-1182382
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-11823822005-08-04 Nh3D: A reference dataset of non-homologous protein structures Thiruv, B Quon, G Saldanha, SA Steipe, B BMC Struct Biol Database BACKGROUND: The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfil minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. Here we provide a reference dataset of non-homologous protein domains, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry. The dataset is based on domains at the Topology level of the CATH database which hierarchically classifies all protein structures. It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. The compilation of Nh3D is fully scripted. RESULTS: The current Nh3D list contains 570 domains with a total of 90780 residues. It covers more than 70% of folds at the Topology level of the CATH database and represents more than 90% of the structures in the PDB that have been classified by CATH. We observe that even though all protein pairs are structurally dissimilar, some pairwise sequence identities after global alignment are greater than 30%. CONCLUSION: Nh3D is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB. Regularly updated versions of Nh3D and the corresponding PDB-formatted coordinate sets are accessible from our Web site . BioMed Central 2005-07-12 /pmc/articles/PMC1182382/ /pubmed/16011803 http://dx.doi.org/10.1186/1472-6807-5-12 Text en Copyright © 2005 Thiruv et al; licensee BioMed Central Ltd.
spellingShingle	Database Thiruv, B Quon, G Saldanha, SA Steipe, B Nh3D: A reference dataset of non-homologous protein structures
title	Nh3D: A reference dataset of non-homologous protein structures
title_full	Nh3D: A reference dataset of non-homologous protein structures
title_fullStr	Nh3D: A reference dataset of non-homologous protein structures
title_full_unstemmed	Nh3D: A reference dataset of non-homologous protein structures
title_short	Nh3D: A reference dataset of non-homologous protein structures
title_sort	nh3d: a reference dataset of non-homologous protein structures
topic	Database
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182382/ https://www.ncbi.nlm.nih.gov/pubmed/16011803 http://dx.doi.org/10.1186/1472-6807-5-12
work_keys_str_mv	AT thiruvb nh3dareferencedatasetofnonhomologousproteinstructures AT quong nh3dareferencedatasetofnonhomologousproteinstructures AT saldanhasa nh3dareferencedatasetofnonhomologousproteinstructures AT steipeb nh3dareferencedatasetofnonhomologousproteinstructures

Nh3D: A reference dataset of non-homologous protein structures

Ejemplares similares