Cargando…

Improved performance of sequence search approaches in remote homology detection

The protein sequence space is vast and diverse, spanning across different families. Biologically meaningful relationships exist between proteins at superfamily level. However, it is highly challenging to establish convincing relationships at the superfamily level by means of simple sequence searches...

Descripción completa

Detalles Bibliográficos
Autores principales: Joshi, Adwait Govind, Raghavender, Upadhyayula Surya, Sowdhamini, Ramanathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240247/
https://www.ncbi.nlm.nih.gov/pubmed/25469226
http://dx.doi.org/10.12688/f1000research.2-93.v2
_version_ 1782345702764969984
author Joshi, Adwait Govind
Raghavender, Upadhyayula Surya
Sowdhamini, Ramanathan
author_facet Joshi, Adwait Govind
Raghavender, Upadhyayula Surya
Sowdhamini, Ramanathan
author_sort Joshi, Adwait Govind
collection PubMed
description The protein sequence space is vast and diverse, spanning across different families. Biologically meaningful relationships exist between proteins at superfamily level. However, it is highly challenging to establish convincing relationships at the superfamily level by means of simple sequence searches. It is necessary to design a rigorous sequence search strategy to establish remote homology relationships and achieve high coverage. We have used iterative profile-based methods, along with constraints of sequence motifs, to specify search directions. We address the importance of multiple start points (queries) to achieve high coverage at protein superfamily level. We have devised strategies to employ a structural regime to search sequence space with good specificity and sensitivity. We employ two well-known sequence search methods, PSI-BLAST and PHI-BLAST, with multiple queries and multiple patterns to enhance homologue identification at the structural superfamily level. The study suggests that multiple queries improve sensitivity, while a pattern-constrained iterative sequence search becomes stringent at the initial stages, thereby driving the search in a specific direction and also achieves high coverage. This data mining approach has been applied to the entire structural superfamily database.
format Online
Article
Text
id pubmed-4240247
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-42402472014-12-01 Improved performance of sequence search approaches in remote homology detection Joshi, Adwait Govind Raghavender, Upadhyayula Surya Sowdhamini, Ramanathan F1000Res Research Article The protein sequence space is vast and diverse, spanning across different families. Biologically meaningful relationships exist between proteins at superfamily level. However, it is highly challenging to establish convincing relationships at the superfamily level by means of simple sequence searches. It is necessary to design a rigorous sequence search strategy to establish remote homology relationships and achieve high coverage. We have used iterative profile-based methods, along with constraints of sequence motifs, to specify search directions. We address the importance of multiple start points (queries) to achieve high coverage at protein superfamily level. We have devised strategies to employ a structural regime to search sequence space with good specificity and sensitivity. We employ two well-known sequence search methods, PSI-BLAST and PHI-BLAST, with multiple queries and multiple patterns to enhance homologue identification at the structural superfamily level. The study suggests that multiple queries improve sensitivity, while a pattern-constrained iterative sequence search becomes stringent at the initial stages, thereby driving the search in a specific direction and also achieves high coverage. This data mining approach has been applied to the entire structural superfamily database. F1000Research 2014-07-16 /pmc/articles/PMC4240247/ /pubmed/25469226 http://dx.doi.org/10.12688/f1000research.2-93.v2 Text en Copyright: © 2014 Joshi AG et al. http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/publicdomain/zero/1.0/ Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
spellingShingle Research Article
Joshi, Adwait Govind
Raghavender, Upadhyayula Surya
Sowdhamini, Ramanathan
Improved performance of sequence search approaches in remote homology detection
title Improved performance of sequence search approaches in remote homology detection
title_full Improved performance of sequence search approaches in remote homology detection
title_fullStr Improved performance of sequence search approaches in remote homology detection
title_full_unstemmed Improved performance of sequence search approaches in remote homology detection
title_short Improved performance of sequence search approaches in remote homology detection
title_sort improved performance of sequence search approaches in remote homology detection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240247/
https://www.ncbi.nlm.nih.gov/pubmed/25469226
http://dx.doi.org/10.12688/f1000research.2-93.v2
work_keys_str_mv AT joshiadwaitgovind improvedperformanceofsequencesearchapproachesinremotehomologydetection
AT raghavenderupadhyayulasurya improvedperformanceofsequencesearchapproachesinremotehomologydetection
AT sowdhaminiramanathan improvedperformanceofsequencesearchapproachesinremotehomologydetection