Cargando…

Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage

BACKGROUND: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectivel...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaushik, Swati, Mutt, Eshita, Chellappan, Ajithavalli, Sankaran, Sandhya, Srinivasan, Narayanaswamy, Sowdhamini, Ramanathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577913/
https://www.ncbi.nlm.nih.gov/pubmed/23437136
http://dx.doi.org/10.1371/journal.pone.0056449
_version_ 1782260002966208512
author Kaushik, Swati
Mutt, Eshita
Chellappan, Ajithavalli
Sankaran, Sandhya
Srinivasan, Narayanaswamy
Sowdhamini, Ramanathan
author_facet Kaushik, Swati
Mutt, Eshita
Chellappan, Ajithavalli
Sankaran, Sandhya
Srinivasan, Narayanaswamy
Sowdhamini, Ramanathan
author_sort Kaushik, Swati
collection PubMed
description BACKGROUND: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. METHODOLOGY/PRINCIPAL FINDINGS: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. CONCLUSIONS/SIGNIFICANCE: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.
format Online
Article
Text
id pubmed-3577913
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35779132013-02-22 Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage Kaushik, Swati Mutt, Eshita Chellappan, Ajithavalli Sankaran, Sandhya Srinivasan, Narayanaswamy Sowdhamini, Ramanathan PLoS One Research Article BACKGROUND: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. METHODOLOGY/PRINCIPAL FINDINGS: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. CONCLUSIONS/SIGNIFICANCE: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families. Public Library of Science 2013-02-20 /pmc/articles/PMC3577913/ /pubmed/23437136 http://dx.doi.org/10.1371/journal.pone.0056449 Text en © 2013 Kaushik et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kaushik, Swati
Mutt, Eshita
Chellappan, Ajithavalli
Sankaran, Sandhya
Srinivasan, Narayanaswamy
Sowdhamini, Ramanathan
Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
title Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
title_full Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
title_fullStr Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
title_full_unstemmed Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
title_short Improved Detection of Remote Homologues Using Cascade PSI-BLAST: Influence of Neighbouring Protein Families on Sequence Coverage
title_sort improved detection of remote homologues using cascade psi-blast: influence of neighbouring protein families on sequence coverage
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577913/
https://www.ncbi.nlm.nih.gov/pubmed/23437136
http://dx.doi.org/10.1371/journal.pone.0056449
work_keys_str_mv AT kaushikswati improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT mutteshita improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT chellappanajithavalli improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT sankaransandhya improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT srinivasannarayanaswamy improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT sowdhaminiramanathan improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage