Cargando…

Discovering co-occurring patterns and their biological significance in protein families

BACKGROUND: The large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patt...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, En-Shiun Annie, Fung, Sanderz, Sze-To, Ho-Yin, Wong, Andrew K C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243116/
https://www.ncbi.nlm.nih.gov/pubmed/25474736
http://dx.doi.org/10.1186/1471-2105-15-S12-S2
_version_ 1782346062795636736
author Lee, En-Shiun Annie
Fung, Sanderz
Sze-To, Ho-Yin
Wong, Andrew K C
author_facet Lee, En-Shiun Annie
Fung, Sanderz
Sze-To, Ho-Yin
Wong, Andrew K C
author_sort Lee, En-Shiun Annie
collection PubMed
description BACKGROUND: The large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patterns from two conserved regions co-occurring frequently on the same sequences are inferred to have joint functionality. A method for finding conserved regions in protein families with frequent co-occurrence patterns is proposed. The biological significance of the discovered clusters of conserved regions with co-occurrences patterns can be validated by their three-dimensional closeness of amino acids and the biological functionality found in those regions as supported by published work. METHODS: Using existing algorithms, we discovered statistically significant amino acid associations as sequence patterns. We then aligned and clustered them into Aligned Pattern Clusters (APCs) corresponding to conserved regions with amino acid conservation and variation. When one APC frequently co-occured with another APC, the two APCs have high co-occurrence. We then clustered APCs with high co-occurrence into what we refer to as Co-occurrence APC Clusters (Co-occurrence Clusters). RESULTS: Our results show that for Co-occurrence Clusters, the three-dimensional distance between their amino acids is closer than average amino acid distances. For the Co-occurrence Clusters of the ubiquitin and the cytochrome c families, we observed biological significance among the residing amino acids of the APCs within the same cluster. In ubiquitin, the residues are responsible for ubiquitination as well as conventional and unconventional ubiquitin-bindings. In cytochrome c, amino acids in the first co-occurrence cluster contribute to binding of other proteins in the electron transport chain, and amino acids in the second co-occurrence cluster contribute to the stability of the axial heme ligand. CONCLUSIONS: Thus, our co-occurrence clustering algorithm can efficiently find and rank conserved regions that contain patterns that frequently co-occurring on the same proteins. Co-occurring patterns are biologically significant due to their three-dimensional closeness and other evidences reported in literature. These results play an important role in drug discovery as biologists can quickly identify the target for drugs to conduct detailed preclinical studies.
format Online
Article
Text
id pubmed-4243116
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42431162014-11-26 Discovering co-occurring patterns and their biological significance in protein families Lee, En-Shiun Annie Fung, Sanderz Sze-To, Ho-Yin Wong, Andrew K C BMC Bioinformatics Research BACKGROUND: The large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patterns from two conserved regions co-occurring frequently on the same sequences are inferred to have joint functionality. A method for finding conserved regions in protein families with frequent co-occurrence patterns is proposed. The biological significance of the discovered clusters of conserved regions with co-occurrences patterns can be validated by their three-dimensional closeness of amino acids and the biological functionality found in those regions as supported by published work. METHODS: Using existing algorithms, we discovered statistically significant amino acid associations as sequence patterns. We then aligned and clustered them into Aligned Pattern Clusters (APCs) corresponding to conserved regions with amino acid conservation and variation. When one APC frequently co-occured with another APC, the two APCs have high co-occurrence. We then clustered APCs with high co-occurrence into what we refer to as Co-occurrence APC Clusters (Co-occurrence Clusters). RESULTS: Our results show that for Co-occurrence Clusters, the three-dimensional distance between their amino acids is closer than average amino acid distances. For the Co-occurrence Clusters of the ubiquitin and the cytochrome c families, we observed biological significance among the residing amino acids of the APCs within the same cluster. In ubiquitin, the residues are responsible for ubiquitination as well as conventional and unconventional ubiquitin-bindings. In cytochrome c, amino acids in the first co-occurrence cluster contribute to binding of other proteins in the electron transport chain, and amino acids in the second co-occurrence cluster contribute to the stability of the axial heme ligand. CONCLUSIONS: Thus, our co-occurrence clustering algorithm can efficiently find and rank conserved regions that contain patterns that frequently co-occurring on the same proteins. Co-occurring patterns are biologically significant due to their three-dimensional closeness and other evidences reported in literature. These results play an important role in drug discovery as biologists can quickly identify the target for drugs to conduct detailed preclinical studies. BioMed Central 2014-11-06 /pmc/articles/PMC4243116/ /pubmed/25474736 http://dx.doi.org/10.1186/1471-2105-15-S12-S2 Text en Copyright © 2014 Lee et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lee, En-Shiun Annie
Fung, Sanderz
Sze-To, Ho-Yin
Wong, Andrew K C
Discovering co-occurring patterns and their biological significance in protein families
title Discovering co-occurring patterns and their biological significance in protein families
title_full Discovering co-occurring patterns and their biological significance in protein families
title_fullStr Discovering co-occurring patterns and their biological significance in protein families
title_full_unstemmed Discovering co-occurring patterns and their biological significance in protein families
title_short Discovering co-occurring patterns and their biological significance in protein families
title_sort discovering co-occurring patterns and their biological significance in protein families
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243116/
https://www.ncbi.nlm.nih.gov/pubmed/25474736
http://dx.doi.org/10.1186/1471-2105-15-S12-S2
work_keys_str_mv AT leeenshiunannie discoveringcooccurringpatternsandtheirbiologicalsignificanceinproteinfamilies
AT fungsanderz discoveringcooccurringpatternsandtheirbiologicalsignificanceinproteinfamilies
AT szetohoyin discoveringcooccurringpatternsandtheirbiologicalsignificanceinproteinfamilies
AT wongandrewkc discoveringcooccurringpatternsandtheirbiologicalsignificanceinproteinfamilies