Cargando…
Sequence-similar, structure-dissimilar protein pairs in the PDB
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Sim...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Wiley Subscription Services, Inc., A Wiley Company
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673347/ https://www.ncbi.nlm.nih.gov/pubmed/18004789 http://dx.doi.org/10.1002/prot.21770 |
_version_ | 1782166580758577152 |
---|---|
author | Kosloff, Mickey Kolodny, Rachel |
author_facet | Kosloff, Mickey Kolodny, Rachel |
author_sort | Kosloff, Mickey |
collection | PubMed |
description | It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm). |
format | Text |
id | pubmed-2673347 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Wiley Subscription Services, Inc., A Wiley Company |
record_format | MEDLINE/PubMed |
spelling | pubmed-26733472009-05-15 Sequence-similar, structure-dissimilar protein pairs in the PDB Kosloff, Mickey Kolodny, Rachel Proteins Research Article It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which “redundant” structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm). Wiley Subscription Services, Inc., A Wiley Company 2008-05-01 2007-11-14 /pmc/articles/PMC2673347/ /pubmed/18004789 http://dx.doi.org/10.1002/prot.21770 Text en Copyright © 2008 Wiley-Liss, Inc., A Wiley Company |
spellingShingle | Research Article Kosloff, Mickey Kolodny, Rachel Sequence-similar, structure-dissimilar protein pairs in the PDB |
title | Sequence-similar, structure-dissimilar protein pairs in the PDB |
title_full | Sequence-similar, structure-dissimilar protein pairs in the PDB |
title_fullStr | Sequence-similar, structure-dissimilar protein pairs in the PDB |
title_full_unstemmed | Sequence-similar, structure-dissimilar protein pairs in the PDB |
title_short | Sequence-similar, structure-dissimilar protein pairs in the PDB |
title_sort | sequence-similar, structure-dissimilar protein pairs in the pdb |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673347/ https://www.ncbi.nlm.nih.gov/pubmed/18004789 http://dx.doi.org/10.1002/prot.21770 |
work_keys_str_mv | AT kosloffmickey sequencesimilarstructuredissimilarproteinpairsinthepdb AT kolodnyrachel sequencesimilarstructuredissimilarproteinpairsinthepdb |