Cargando…

Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity

BACKGROUND: Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known s...

Descripción completa

Detalles Bibliográficos
Autores principales: Jia, Yi, Huan, Jun, Buhr, Vincent, Zhang, Jintao, Carayannopoulos, Leonidas N
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648771/
https://www.ncbi.nlm.nih.gov/pubmed/19208148
http://dx.doi.org/10.1186/1471-2105-10-S1-S46
_version_ 1782164984362434560
author Jia, Yi
Huan, Jun
Buhr, Vincent
Zhang, Jintao
Carayannopoulos, Leonidas N
author_facet Jia, Yi
Huan, Jun
Buhr, Vincent
Zhang, Jintao
Carayannopoulos, Leonidas N
author_sort Jia, Yi
collection PubMed
description BACKGROUND: Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. RESULTS: Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. CONCLUSION: We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty.
format Text
id pubmed-2648771
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26487712009-03-03 Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity Jia, Yi Huan, Jun Buhr, Vincent Zhang, Jintao Carayannopoulos, Leonidas N BMC Bioinformatics Research BACKGROUND: Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. RESULTS: Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. CONCLUSION: We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. BioMed Central 2009-01-30 /pmc/articles/PMC2648771/ /pubmed/19208148 http://dx.doi.org/10.1186/1471-2105-10-S1-S46 Text en Copyright © 2009 Jia et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Jia, Yi
Huan, Jun
Buhr, Vincent
Zhang, Jintao
Carayannopoulos, Leonidas N
Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
title Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
title_full Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
title_fullStr Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
title_full_unstemmed Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
title_short Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
title_sort towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648771/
https://www.ncbi.nlm.nih.gov/pubmed/19208148
http://dx.doi.org/10.1186/1471-2105-10-S1-S46
work_keys_str_mv AT jiayi towardscomprehensivestructuralmotifminingforbetterfoldannotationinthetwilightzoneofsequencedissimilarity
AT huanjun towardscomprehensivestructuralmotifminingforbetterfoldannotationinthetwilightzoneofsequencedissimilarity
AT buhrvincent towardscomprehensivestructuralmotifminingforbetterfoldannotationinthetwilightzoneofsequencedissimilarity
AT zhangjintao towardscomprehensivestructuralmotifminingforbetterfoldannotationinthetwilightzoneofsequencedissimilarity
AT carayannopoulosleonidasn towardscomprehensivestructuralmotifminingforbetterfoldannotationinthetwilightzoneofsequencedissimilarity