Cargando…

Improving protein structure similarity searches using domain boundaries based on conserved sequence information

BACKGROUND: The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein struc...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Kenneth Evan, Wang, Yanli, Madej, Tom, Bryant, Stephen H
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694201/
https://www.ncbi.nlm.nih.gov/pubmed/19454035
http://dx.doi.org/10.1186/1472-6807-9-33
_version_ 1782168054521659392
author Thompson, Kenneth Evan
Wang, Yanli
Madej, Tom
Bryant, Stephen H
author_facet Thompson, Kenneth Evan
Wang, Yanli
Madej, Tom
Bryant, Stephen H
author_sort Thompson, Kenneth Evan
collection PubMed
description BACKGROUND: The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD) would affect the domain comparisons and structure similarity search performance of VAST. RESULTS: Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families. Our analysis indicates that domain boundaries disagree on roughly 8% of protein chains in the medium redundancy subset of the Molecular Modeling Database (MMDB). These conflicting sequence based domain boundaries perform slightly better than structure domains in structure similarity searches, and there are interesting cases when structure similarity search performance is markedly improved. CONCLUSION: Structure similarity searches using domain boundaries based on conserved sequence information can provide an additional method for investigators to identify interesting similarities between proteins with known structures. Because of the improvement in performance of structure similarity searches using sequence domain boundaries, we are in the process of implementing their inclusion into the VAST search and MMDB resources in the NCBI Entrez system.
format Text
id pubmed-2694201
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26942012009-06-09 Improving protein structure similarity searches using domain boundaries based on conserved sequence information Thompson, Kenneth Evan Wang, Yanli Madej, Tom Bryant, Stephen H BMC Struct Biol Research Article BACKGROUND: The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definitions of domains derived from conserved sequence alignments in the Conserved Domain Database (CDD) would affect the domain comparisons and structure similarity search performance of VAST. RESULTS: Alternative domains, which have significantly different secondary structure composition from those based on structurally compact units, were identified based on the alignment footprints of curated protein sequence domain families. Our analysis indicates that domain boundaries disagree on roughly 8% of protein chains in the medium redundancy subset of the Molecular Modeling Database (MMDB). These conflicting sequence based domain boundaries perform slightly better than structure domains in structure similarity searches, and there are interesting cases when structure similarity search performance is markedly improved. CONCLUSION: Structure similarity searches using domain boundaries based on conserved sequence information can provide an additional method for investigators to identify interesting similarities between proteins with known structures. Because of the improvement in performance of structure similarity searches using sequence domain boundaries, we are in the process of implementing their inclusion into the VAST search and MMDB resources in the NCBI Entrez system. BioMed Central 2009-05-19 /pmc/articles/PMC2694201/ /pubmed/19454035 http://dx.doi.org/10.1186/1472-6807-9-33 Text en Copyright © 2009 Thompson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Thompson, Kenneth Evan
Wang, Yanli
Madej, Tom
Bryant, Stephen H
Improving protein structure similarity searches using domain boundaries based on conserved sequence information
title Improving protein structure similarity searches using domain boundaries based on conserved sequence information
title_full Improving protein structure similarity searches using domain boundaries based on conserved sequence information
title_fullStr Improving protein structure similarity searches using domain boundaries based on conserved sequence information
title_full_unstemmed Improving protein structure similarity searches using domain boundaries based on conserved sequence information
title_short Improving protein structure similarity searches using domain boundaries based on conserved sequence information
title_sort improving protein structure similarity searches using domain boundaries based on conserved sequence information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694201/
https://www.ncbi.nlm.nih.gov/pubmed/19454035
http://dx.doi.org/10.1186/1472-6807-9-33
work_keys_str_mv AT thompsonkennethevan improvingproteinstructuresimilaritysearchesusingdomainboundariesbasedonconservedsequenceinformation
AT wangyanli improvingproteinstructuresimilaritysearchesusingdomainboundariesbasedonconservedsequenceinformation
AT madejtom improvingproteinstructuresimilaritysearchesusingdomainboundariesbasedonconservedsequenceinformation
AT bryantstephenh improvingproteinstructuresimilaritysearchesusingdomainboundariesbasedonconservedsequenceinformation