Cargando…

A sequence sub-sampling algorithm increases the power to detect distant homologues

Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process...

Descripción completa

Detalles Bibliográficos
Autores principales:	Johnston, Catrióna R., Shields, Denis C.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2005
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1174907/ https://www.ncbi.nlm.nih.gov/pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687

_version_	1782124475478704128
author	Johnston, Catrióna R. Shields, Denis C.
author_facet	Johnston, Catrióna R. Shields, Denis C.
author_sort	Johnston, Catrióna R.
collection	PubMed
description	Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.
format	Text
id	pubmed-1174907
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-11749072005-07-11 A sequence sub-sampling algorithm increases the power to detect distant homologues Johnston, Catrióna R. Shields, Denis C. Nucleic Acids Res Article Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors. Oxford University Press 2005 2005-07-08 /pmc/articles/PMC1174907/ /pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle	Article Johnston, Catrióna R. Shields, Denis C. A sequence sub-sampling algorithm increases the power to detect distant homologues
title	A sequence sub-sampling algorithm increases the power to detect distant homologues
title_full	A sequence sub-sampling algorithm increases the power to detect distant homologues
title_fullStr	A sequence sub-sampling algorithm increases the power to detect distant homologues
title_full_unstemmed	A sequence sub-sampling algorithm increases the power to detect distant homologues
title_short	A sequence sub-sampling algorithm increases the power to detect distant homologues
title_sort	sequence sub-sampling algorithm increases the power to detect distant homologues
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1174907/ https://www.ncbi.nlm.nih.gov/pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687
work_keys_str_mv	AT johnstoncatrionar asequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues AT shieldsdenisc asequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues AT johnstoncatrionar sequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues AT shieldsdenisc sequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues

A sequence sub-sampling algorithm increases the power to detect distant homologues

Ejemplares similares