Cargando…

Considering scores between unrelated proteins in the search database improves profile comparison

BACKGROUND: Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of det...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sadreyev, Ruslan I, Wang, Yong, Grishin, Nick V
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3087343/ https://www.ncbi.nlm.nih.gov/pubmed/19961610 http://dx.doi.org/10.1186/1471-2105-10-399

_version_	1782202769795448832
author	Sadreyev, Ruslan I Wang, Yong Grishin, Nick V
author_facet	Sadreyev, Ruslan I Wang, Yong Grishin, Nick V
author_sort	Sadreyev, Ruslan I
collection	PubMed
description	BACKGROUND: Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similarities is essential for further development of this methodology. Here we analyze a novel approach to estimate the statistical significance of profile similarity: the explicit consideration of background score distributions for each database template (subject). RESULTS: Using a simple scheme to combine and analytically approximate query- and subject-based distributions, we show that (i) inclusion of background distributions for the subjects increases the quality of homology detection; (ii) this increase is higher when the distributions are based on the scores to all known non-homologs of the subject rather than a small calibration subset of the database representatives; and (iii) these all known non-homolog distributions of scores for the subject make the dominant contribution to the improved performance: adding the calibration distribution of the query has a negligible additional effect. CONCLUSION: The construction of distributions based on the complete sets of non-homologs for each subject is particularly relevant in the setting of structure prediction where the database consists of proteins with solved 3D structure (PDB, SCOP, CATH, etc.) and therefore structural relationships between proteins are known. These results point to a potential new direction in the development of more powerful methods for remote homology detection.
format	Text
id	pubmed-3087343
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30873432011-05-05 Considering scores between unrelated proteins in the search database improves profile comparison Sadreyev, Ruslan I Wang, Yong Grishin, Nick V BMC Bioinformatics Research Article BACKGROUND: Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similarities is essential for further development of this methodology. Here we analyze a novel approach to estimate the statistical significance of profile similarity: the explicit consideration of background score distributions for each database template (subject). RESULTS: Using a simple scheme to combine and analytically approximate query- and subject-based distributions, we show that (i) inclusion of background distributions for the subjects increases the quality of homology detection; (ii) this increase is higher when the distributions are based on the scores to all known non-homologs of the subject rather than a small calibration subset of the database representatives; and (iii) these all known non-homolog distributions of scores for the subject make the dominant contribution to the improved performance: adding the calibration distribution of the query has a negligible additional effect. CONCLUSION: The construction of distributions based on the complete sets of non-homologs for each subject is particularly relevant in the setting of structure prediction where the database consists of proteins with solved 3D structure (PDB, SCOP, CATH, etc.) and therefore structural relationships between proteins are known. These results point to a potential new direction in the development of more powerful methods for remote homology detection. BioMed Central 2009-12-04 /pmc/articles/PMC3087343/ /pubmed/19961610 http://dx.doi.org/10.1186/1471-2105-10-399 Text en Copyright ©2009 Sadreyev et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Sadreyev, Ruslan I Wang, Yong Grishin, Nick V Considering scores between unrelated proteins in the search database improves profile comparison
title	Considering scores between unrelated proteins in the search database improves profile comparison
title_full	Considering scores between unrelated proteins in the search database improves profile comparison
title_fullStr	Considering scores between unrelated proteins in the search database improves profile comparison
title_full_unstemmed	Considering scores between unrelated proteins in the search database improves profile comparison
title_short	Considering scores between unrelated proteins in the search database improves profile comparison
title_sort	considering scores between unrelated proteins in the search database improves profile comparison
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3087343/ https://www.ncbi.nlm.nih.gov/pubmed/19961610 http://dx.doi.org/10.1186/1471-2105-10-399
work_keys_str_mv	AT sadreyevruslani consideringscoresbetweenunrelatedproteinsinthesearchdatabaseimprovesprofilecomparison AT wangyong consideringscoresbetweenunrelatedproteinsinthesearchdatabaseimprovesprofilecomparison AT grishinnickv consideringscoresbetweenunrelatedproteinsinthesearchdatabaseimprovesprofilecomparison

Considering scores between unrelated proteins in the search database improves profile comparison

Ejemplares similares