Cargando…
A sequence sub-sampling algorithm increases the power to detect distant homologues
Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1174907/ https://www.ncbi.nlm.nih.gov/pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687 |
_version_ | 1782124475478704128 |
---|---|
author | Johnston, Catrióna R. Shields, Denis C. |
author_facet | Johnston, Catrióna R. Shields, Denis C. |
author_sort | Johnston, Catrióna R. |
collection | PubMed |
description | Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors. |
format | Text |
id | pubmed-1174907 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-11749072005-07-11 A sequence sub-sampling algorithm increases the power to detect distant homologues Johnston, Catrióna R. Shields, Denis C. Nucleic Acids Res Article Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors. Oxford University Press 2005 2005-07-08 /pmc/articles/PMC1174907/ /pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687 Text en © The Author 2005. Published by Oxford University Press. All rights reserved |
spellingShingle | Article Johnston, Catrióna R. Shields, Denis C. A sequence sub-sampling algorithm increases the power to detect distant homologues |
title | A sequence sub-sampling algorithm increases the power to detect distant homologues |
title_full | A sequence sub-sampling algorithm increases the power to detect distant homologues |
title_fullStr | A sequence sub-sampling algorithm increases the power to detect distant homologues |
title_full_unstemmed | A sequence sub-sampling algorithm increases the power to detect distant homologues |
title_short | A sequence sub-sampling algorithm increases the power to detect distant homologues |
title_sort | sequence sub-sampling algorithm increases the power to detect distant homologues |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1174907/ https://www.ncbi.nlm.nih.gov/pubmed/16006623 http://dx.doi.org/10.1093/nar/gki687 |
work_keys_str_mv | AT johnstoncatrionar asequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues AT shieldsdenisc asequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues AT johnstoncatrionar sequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues AT shieldsdenisc sequencesubsamplingalgorithmincreasesthepowertodetectdistanthomologues |