Cargando…

Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms

BACKGROUND: microRNA (miRNA) is a short RNA (~ 22 nt) that regulates gene expression at the posttranscriptional level. Aberration of miRNA expressions could affect their targeting mRNAs involved in cancer-related signaling pathways. We conduct clustering analysis of miRNA and mRNA using expression d...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Lizhong, Feng, Zheyun, Bai, Yongsheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6683425/
https://www.ncbi.nlm.nih.gov/pubmed/31382962
http://dx.doi.org/10.1186/s12920-019-0562-z
_version_ 1783442090328326144
author Ding, Lizhong
Feng, Zheyun
Bai, Yongsheng
author_facet Ding, Lizhong
Feng, Zheyun
Bai, Yongsheng
author_sort Ding, Lizhong
collection PubMed
description BACKGROUND: microRNA (miRNA) is a short RNA (~ 22 nt) that regulates gene expression at the posttranscriptional level. Aberration of miRNA expressions could affect their targeting mRNAs involved in cancer-related signaling pathways. We conduct clustering analysis of miRNA and mRNA using expression data from the Cancer Genome Atlas (TCGA). We combine the Hungarian algorithm and blossom algorithm in graph theory. Data analysis is done using programming language R and Python. METHODS: We first quantify edge-weights of the miRNA-mRNA pairs by combining their expression correlation coefficient in tumor (T_CC) and correlation coefficient in normal (N_CC). We thereby introduce a bipartite graph partition procedure to identify cluster candidates. Specifically, we propose six weight formulas to quantify the change of miRNA-mRNA expression T_CC relative to N_CC, and apply the traditional hierarchical clustering to subjectively evaluate the different weight formulas of miRNA-mRNA pairs. Among these six different weight formulas, we choose the optimal one, which we define as the integrated mean value weights, to represent the connections between miRNA and mRNAs. Then the Hungarian algorithm and the blossom algorithm are employed on the miRNA-mRNA bipartite graph to passively determine the clusters. The combination of Hungarian and the blossom algorithms is dubbed maximum weighted merger method (MWMM). RESULTS: MWMM identifies clusters of different sizes that meet the mathematical criterion that internal connections inside a cluster are relatively denser than external connections outside the cluster and biological criterion that the intra-cluster Gene Ontology (GO) term similarities are larger than the inter-cluster GO term similarities. MWMM is developed using breast invasive carcinoma (BRCA) as training data set, but can also applies to other cancer type data sets. MWMM shows advantage in GO term similarity in most cancer types, when compared to other algorithms. CONCLUSIONS: miRNAs and mRNAs that are likely to be affected by common underlying causal factors in cancer can be clustered by MWMM approach and potentially be used as candidate biomarkers for different cancer types and provide clues for targets of precision medicine in cancer treatment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-019-0562-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6683425
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66834252019-08-09 Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms Ding, Lizhong Feng, Zheyun Bai, Yongsheng BMC Med Genomics Research Article BACKGROUND: microRNA (miRNA) is a short RNA (~ 22 nt) that regulates gene expression at the posttranscriptional level. Aberration of miRNA expressions could affect their targeting mRNAs involved in cancer-related signaling pathways. We conduct clustering analysis of miRNA and mRNA using expression data from the Cancer Genome Atlas (TCGA). We combine the Hungarian algorithm and blossom algorithm in graph theory. Data analysis is done using programming language R and Python. METHODS: We first quantify edge-weights of the miRNA-mRNA pairs by combining their expression correlation coefficient in tumor (T_CC) and correlation coefficient in normal (N_CC). We thereby introduce a bipartite graph partition procedure to identify cluster candidates. Specifically, we propose six weight formulas to quantify the change of miRNA-mRNA expression T_CC relative to N_CC, and apply the traditional hierarchical clustering to subjectively evaluate the different weight formulas of miRNA-mRNA pairs. Among these six different weight formulas, we choose the optimal one, which we define as the integrated mean value weights, to represent the connections between miRNA and mRNAs. Then the Hungarian algorithm and the blossom algorithm are employed on the miRNA-mRNA bipartite graph to passively determine the clusters. The combination of Hungarian and the blossom algorithms is dubbed maximum weighted merger method (MWMM). RESULTS: MWMM identifies clusters of different sizes that meet the mathematical criterion that internal connections inside a cluster are relatively denser than external connections outside the cluster and biological criterion that the intra-cluster Gene Ontology (GO) term similarities are larger than the inter-cluster GO term similarities. MWMM is developed using breast invasive carcinoma (BRCA) as training data set, but can also applies to other cancer type data sets. MWMM shows advantage in GO term similarity in most cancer types, when compared to other algorithms. CONCLUSIONS: miRNAs and mRNAs that are likely to be affected by common underlying causal factors in cancer can be clustered by MWMM approach and potentially be used as candidate biomarkers for different cancer types and provide clues for targets of precision medicine in cancer treatment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-019-0562-z) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-05 /pmc/articles/PMC6683425/ /pubmed/31382962 http://dx.doi.org/10.1186/s12920-019-0562-z Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ding, Lizhong
Feng, Zheyun
Bai, Yongsheng
Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms
title Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms
title_full Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms
title_fullStr Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms
title_full_unstemmed Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms
title_short Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms
title_sort clustering analysis of microrna and mrna expression data from tcga using maximum edge-weighted matching algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6683425/
https://www.ncbi.nlm.nih.gov/pubmed/31382962
http://dx.doi.org/10.1186/s12920-019-0562-z
work_keys_str_mv AT dinglizhong clusteringanalysisofmicrornaandmrnaexpressiondatafromtcgausingmaximumedgeweightedmatchingalgorithms
AT fengzheyun clusteringanalysisofmicrornaandmrnaexpressiondatafromtcgausingmaximumedgeweightedmatchingalgorithms
AT baiyongsheng clusteringanalysisofmicrornaandmrnaexpressiondatafromtcgausingmaximumedgeweightedmatchingalgorithms