Cargando…

A sequence sub-sampling algorithm increases the power to detect distant homologues

Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process...

Descripción completa

Detalles Bibliográficos
Autores principales: Johnston, Catrióna R., Shields, Denis C.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1174907/
https://www.ncbi.nlm.nih.gov/pubmed/16006623
http://dx.doi.org/10.1093/nar/gki687
_version_ 1782124475478704128
author Johnston, Catrióna R.
Shields, Denis C.
author_facet Johnston, Catrióna R.
Shields, Denis C.
author_sort Johnston, Catrióna R.
collection PubMed
description Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.
format Text
id pubmed-1174907
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-11749072005-07-11 A sequence sub-sampling algorithm increases the power to detect distant homologues Johnston, Catrióna R. Shields, Denis C. Nucleic Acids Res Article Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors. Oxford University Press 2005 2005-07-08 /pmc/articles/PMC1174907/ /pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Johnston, Catrióna R.
Shields, Denis C.
A sequence sub-sampling algorithm increases the power to detect distant homologues
title A sequence sub-sampling algorithm increases the power to detect distant homologues
title_full A sequence sub-sampling algorithm increases the power to detect distant homologues
title_fullStr A sequence sub-sampling algorithm increases the power to detect distant homologues
title_full_unstemmed A sequence sub-sampling algorithm increases the power to detect distant homologues
title_short A sequence sub-sampling algorithm increases the power to detect distant homologues
title_sort sequence sub-sampling algorithm increases the power to detect distant homologues
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1174907/
https://www.ncbi.nlm.nih.gov/pubmed/16006623
http://dx.doi.org/10.1093/nar/gki687
work_keys_str_mv AT johnstoncatrionar asequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues
AT shieldsdenisc asequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues
AT johnstoncatrionar sequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues
AT shieldsdenisc sequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues