Cargando…

Sequence-similar, structure-dissimilar protein pairs in the PDB

It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Sim...

Descripción completa

Detalles Bibliográficos
Autores principales: Kosloff, Mickey, Kolodny, Rachel
Formato: Texto
Lenguaje:English
Publicado: Wiley Subscription Services, Inc., A Wiley Company 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673347/
https://www.ncbi.nlm.nih.gov/pubmed/18004789
http://dx.doi.org/10.1002/prot.21770
_version_ 1782166580758577152
author Kosloff, Mickey
Kolodny, Rachel
author_facet Kosloff, Mickey
Kolodny, Rachel
author_sort Kosloff, Mickey
collection PubMed
description It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
format Text
id pubmed-2673347
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Wiley Subscription Services, Inc., A Wiley Company
record_format MEDLINE/PubMed
spelling pubmed-26733472009-05-15 Sequence-similar, structure-dissimilar protein pairs in the PDB Kosloff, Mickey Kolodny, Rachel Proteins Research Article It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm). Wiley Subscription Services, Inc., A Wiley Company 2008-05-01 2007-11-14 /pmc/articles/PMC2673347/ /pubmed/18004789 http://dx.doi.org/10.1002/prot.21770 Text en Copyright © 2008 Wiley-Liss, Inc., A Wiley Company
spellingShingle Research Article
Kosloff, Mickey
Kolodny, Rachel
Sequence-similar, structure-dissimilar protein pairs in the PDB
title Sequence-similar, structure-dissimilar protein pairs in the PDB
title_full Sequence-similar, structure-dissimilar protein pairs in the PDB
title_fullStr Sequence-similar, structure-dissimilar protein pairs in the PDB
title_full_unstemmed Sequence-similar, structure-dissimilar protein pairs in the PDB
title_short Sequence-similar, structure-dissimilar protein pairs in the PDB
title_sort sequence-similar, structure-dissimilar protein pairs in the pdb
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673347/
https://www.ncbi.nlm.nih.gov/pubmed/18004789
http://dx.doi.org/10.1002/prot.21770
work_keys_str_mv AT kosloffmickey sequencesimilarstructuredissimilarproteinpairsinthepdb
AT kolodnyrachel sequencesimilarstructuredissimilarproteinpairsinthepdb