Cargando…

Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach

BACKGROUND: Most profile and motif databases strive to classify protein sequences into a broad spectrum of protein families. The next step of such database studies should include the development of classification systems capable of distinguishing between subfamilies within a structurally and functio...

Descripción completa

Detalles Bibliográficos
Autores principales: Truong, Kevin, Ikura, Mitsuhiko
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC65048/
https://www.ncbi.nlm.nih.gov/pubmed/11818024
http://dx.doi.org/10.1186/1471-2105-3-1
_version_ 1782120158470340608
author Truong, Kevin
Ikura, Mitsuhiko
author_facet Truong, Kevin
Ikura, Mitsuhiko
author_sort Truong, Kevin
collection PubMed
description BACKGROUND: Most profile and motif databases strive to classify protein sequences into a broad spectrum of protein families. The next step of such database studies should include the development of classification systems capable of distinguishing between subfamilies within a structurally and functionally diverse superfamily. This would be helpful in elucidating sequence-structure-function relationships of proteins. RESULTS: Here, we present a method to diagnose sequences into subfamilies by employing hidden Markov models (HMMs) to find windows of residues that are distinct among subfamilies (called signatures). The method starts with a multiple sequence alignment (MSA) of the subfamily. Then, we build a HMM database representing all sliding windows of the MSA of a fixed size. Finally, we construct a HMM histogram of the matches of each sliding window in the entire superfamily. To illustrate the efficacy of the method, we have applied the analysis to find subfamily signatures in two well-studied superfamilies: the cadherin and the EF-hand protein superfamilies. As a corollary, the HMM histograms of the analyzed subfamilies revealed information about their Ca(2+) binding sites and loops. CONCLUSIONS: The method is used to create HMM databases to diagnose subfamilies of protein superfamilies that complement broad profile and motif databases such as BLOCKS, PROSITE, Pfam, SMART, PRINTS and InterPro.
format Text
id pubmed-65048
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-650482002-01-31 Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach Truong, Kevin Ikura, Mitsuhiko BMC Bioinformatics Methodology article BACKGROUND: Most profile and motif databases strive to classify protein sequences into a broad spectrum of protein families. The next step of such database studies should include the development of classification systems capable of distinguishing between subfamilies within a structurally and functionally diverse superfamily. This would be helpful in elucidating sequence-structure-function relationships of proteins. RESULTS: Here, we present a method to diagnose sequences into subfamilies by employing hidden Markov models (HMMs) to find windows of residues that are distinct among subfamilies (called signatures). The method starts with a multiple sequence alignment (MSA) of the subfamily. Then, we build a HMM database representing all sliding windows of the MSA of a fixed size. Finally, we construct a HMM histogram of the matches of each sliding window in the entire superfamily. To illustrate the efficacy of the method, we have applied the analysis to find subfamily signatures in two well-studied superfamilies: the cadherin and the EF-hand protein superfamilies. As a corollary, the HMM histograms of the analyzed subfamilies revealed information about their Ca(2+) binding sites and loops. CONCLUSIONS: The method is used to create HMM databases to diagnose subfamilies of protein superfamilies that complement broad profile and motif databases such as BLOCKS, PROSITE, Pfam, SMART, PRINTS and InterPro. BioMed Central 2002-01-10 /pmc/articles/PMC65048/ /pubmed/11818024 http://dx.doi.org/10.1186/1471-2105-3-1 Text en Copyright ©2002 Truong and Ikura; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology article
Truong, Kevin
Ikura, Mitsuhiko
Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
title Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
title_full Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
title_fullStr Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
title_full_unstemmed Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
title_short Identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden Markov model approach
title_sort identification and characterization of subfamily-specific signatures in a large protein superfamily by a hidden markov model approach
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC65048/
https://www.ncbi.nlm.nih.gov/pubmed/11818024
http://dx.doi.org/10.1186/1471-2105-3-1
work_keys_str_mv AT truongkevin identificationandcharacterizationofsubfamilyspecificsignaturesinalargeproteinsuperfamilybyahiddenmarkovmodelapproach
AT ikuramitsuhiko identificationandcharacterizationofsubfamilyspecificsignaturesinalargeproteinsuperfamilybyahiddenmarkovmodelapproach