Cargando…

Sequence-similar, structure-dissimilar protein pairs in the PDB

It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Sim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kosloff, Mickey, Kolodny, Rachel
Formato:	Texto
Lenguaje:	English
Publicado:	Wiley Subscription Services, Inc., A Wiley Company 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673347/ https://www.ncbi.nlm.nih.gov/pubmed/18004789 http://dx.doi.org/10.1002/prot.21770

_version_	1782166580758577152
author	Kosloff, Mickey Kolodny, Rachel
author_facet	Kosloff, Mickey Kolodny, Rachel
author_sort	Kosloff, Mickey
collection	PubMed
description	It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
format	Text
id	pubmed-2673347
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Wiley Subscription Services, Inc., A Wiley Company
record_format	MEDLINE/PubMed
spelling	pubmed-26733472009-05-15 Sequence-similar, structure-dissimilar protein pairs in the PDB Kosloff, Mickey Kolodny, Rachel Proteins Research Article It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm). Wiley Subscription Services, Inc., A Wiley Company 2008-05-01 2007-11-14 /pmc/articles/PMC2673347/ /pubmed/18004789 http://dx.doi.org/10.1002/prot.21770 Text en Copyright © 2008 Wiley-Liss, Inc., A Wiley Company
spellingShingle	Research Article Kosloff, Mickey Kolodny, Rachel Sequence-similar, structure-dissimilar protein pairs in the PDB
title	Sequence-similar, structure-dissimilar protein pairs in the PDB
title_full	Sequence-similar, structure-dissimilar protein pairs in the PDB
title_fullStr	Sequence-similar, structure-dissimilar protein pairs in the PDB
title_full_unstemmed	Sequence-similar, structure-dissimilar protein pairs in the PDB
title_short	Sequence-similar, structure-dissimilar protein pairs in the PDB
title_sort	sequence-similar, structure-dissimilar protein pairs in the pdb
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673347/ https://www.ncbi.nlm.nih.gov/pubmed/18004789 http://dx.doi.org/10.1002/prot.21770
work_keys_str_mv	AT kosloffmickey sequencesimilarstructuredissimilarproteinpairsinthepdb AT kolodnyrachel sequencesimilarstructuredissimilarproteinpairsinthepdb

Sequence-similar, structure-dissimilar protein pairs in the PDB

Ejemplares similares