Cargando…
Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245498/ https://www.ncbi.nlm.nih.gov/pubmed/30453949 http://dx.doi.org/10.1186/s12920-018-0417-z |
_version_ | 1783372254660263936 |
---|---|
author | Zhou, Pei-Yuan Sze-To, Antonio Wong, Andrew K. C. |
author_facet | Zhou, Pei-Yuan Sze-To, Antonio Wong, Andrew K. C. |
author_sort | Zhou, Pei-Yuan |
collection | PubMed |
description | BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. METHODS: To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. RESULTS: Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. CONCLUSIONS: E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine. |
format | Online Article Text |
id | pubmed-6245498 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62454982018-11-26 Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics Zhou, Pei-Yuan Sze-To, Antonio Wong, Andrew K. C. BMC Med Genomics Research BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. METHODS: To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. RESULTS: Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. CONCLUSIONS: E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine. BioMed Central 2018-11-20 /pmc/articles/PMC6245498/ /pubmed/30453949 http://dx.doi.org/10.1186/s12920-018-0417-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zhou, Pei-Yuan Sze-To, Antonio Wong, Andrew K. C. Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
title | Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
title_full | Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
title_fullStr | Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
title_full_unstemmed | Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
title_short | Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
title_sort | discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245498/ https://www.ncbi.nlm.nih.gov/pubmed/30453949 http://dx.doi.org/10.1186/s12920-018-0417-z |
work_keys_str_mv | AT zhoupeiyuan discoveryanddisentanglementofalignedresidueassociationsfromalignedpatternclusterstorevealsubgroupcharacteristics AT szetoantonio discoveryanddisentanglementofalignedresidueassociationsfromalignedpatternclusterstorevealsubgroupcharacteristics AT wongandrewkc discoveryanddisentanglementofalignedresidueassociationsfromalignedpatternclusterstorevealsubgroupcharacteristics |