Cargando…
Predicting conserved protein motifs with Sub-HMMs
BACKGROUND: Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in en...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2879284/ https://www.ncbi.nlm.nih.gov/pubmed/20420695 http://dx.doi.org/10.1186/1471-2105-11-205 |
_version_ | 1782181913218252800 |
---|---|
author | Horan, Kevin Shelton, Christian R Girke, Thomas |
author_facet | Horan, Kevin Shelton, Christian R Girke, Thomas |
author_sort | Horan, Kevin |
collection | PubMed |
description | BACKGROUND: Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins. RESULTS: To identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities. CONCLUSIONS: Sub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future. |
format | Text |
id | pubmed-2879284 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28792842010-06-02 Predicting conserved protein motifs with Sub-HMMs Horan, Kevin Shelton, Christian R Girke, Thomas BMC Bioinformatics Research article BACKGROUND: Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins. RESULTS: To identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities. CONCLUSIONS: Sub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future. BioMed Central 2010-04-26 /pmc/articles/PMC2879284/ /pubmed/20420695 http://dx.doi.org/10.1186/1471-2105-11-205 Text en Copyright ©2010 Horan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research article Horan, Kevin Shelton, Christian R Girke, Thomas Predicting conserved protein motifs with Sub-HMMs |
title | Predicting conserved protein motifs with Sub-HMMs |
title_full | Predicting conserved protein motifs with Sub-HMMs |
title_fullStr | Predicting conserved protein motifs with Sub-HMMs |
title_full_unstemmed | Predicting conserved protein motifs with Sub-HMMs |
title_short | Predicting conserved protein motifs with Sub-HMMs |
title_sort | predicting conserved protein motifs with sub-hmms |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2879284/ https://www.ncbi.nlm.nih.gov/pubmed/20420695 http://dx.doi.org/10.1186/1471-2105-11-205 |
work_keys_str_mv | AT horankevin predictingconservedproteinmotifswithsubhmms AT sheltonchristianr predictingconservedproteinmotifswithsubhmms AT girkethomas predictingconservedproteinmotifswithsubhmms |