Cargando…

Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics

BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Pei-Yuan, Sze-To, Antonio, Wong, Andrew K. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245498/
https://www.ncbi.nlm.nih.gov/pubmed/30453949
http://dx.doi.org/10.1186/s12920-018-0417-z
_version_ 1783372254660263936
author Zhou, Pei-Yuan
Sze-To, Antonio
Wong, Andrew K. C.
author_facet Zhou, Pei-Yuan
Sze-To, Antonio
Wong, Andrew K. C.
author_sort Zhou, Pei-Yuan
collection PubMed
description BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. METHODS: To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. RESULTS: Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. CONCLUSIONS: E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.
format Online
Article
Text
id pubmed-6245498
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62454982018-11-26 Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics Zhou, Pei-Yuan Sze-To, Antonio Wong, Andrew K. C. BMC Med Genomics Research BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. METHODS: To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. RESULTS: Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. CONCLUSIONS: E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine. BioMed Central 2018-11-20 /pmc/articles/PMC6245498/ /pubmed/30453949 http://dx.doi.org/10.1186/s12920-018-0417-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhou, Pei-Yuan
Sze-To, Antonio
Wong, Andrew K. C.
Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
title Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
title_full Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
title_fullStr Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
title_full_unstemmed Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
title_short Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
title_sort discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245498/
https://www.ncbi.nlm.nih.gov/pubmed/30453949
http://dx.doi.org/10.1186/s12920-018-0417-z
work_keys_str_mv AT zhoupeiyuan discoveryanddisentanglementofalignedresidueassociationsfromalignedpatternclusterstorevealsubgroupcharacteristics
AT szetoantonio discoveryanddisentanglementofalignedresidueassociationsfromalignedpatternclusterstorevealsubgroupcharacteristics
AT wongandrewkc discoveryanddisentanglementofalignedresidueassociationsfromalignedpatternclusterstorevealsubgroupcharacteristics