Cargando…

CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer

BACKGROUND: Gene expression data is widely used for identifying subtypes of diseases such as cancer. Differentially expressed gene analysis and gene set enrichment analysis are widely used for identifying biological mechanisms at the gene level and gene set level, respectively. However, the results...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, Sungjoon, Hwang, Doyeong, Yeo, Yoon Sun, Kim, Hyunggee, Kang, Jaewoo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624175/ https://www.ncbi.nlm.nih.gov/pubmed/31296219 http://dx.doi.org/10.1186/s12920-019-0515-6

_version_	1783434215471185920
author	Park, Sungjoon Hwang, Doyeong Yeo, Yoon Sun Kim, Hyunggee Kang, Jaewoo
author_facet	Park, Sungjoon Hwang, Doyeong Yeo, Yoon Sun Kim, Hyunggee Kang, Jaewoo
author_sort	Park, Sungjoon
collection	PubMed
description	BACKGROUND: Gene expression data is widely used for identifying subtypes of diseases such as cancer. Differentially expressed gene analysis and gene set enrichment analysis are widely used for identifying biological mechanisms at the gene level and gene set level, respectively. However, the results of differentially expressed gene analysis are difficult to interpret and gene set enrichment analysis does not consider the interactions among genes in a gene set. RESULTS: We present CONFIGURE, a pipeline that identifies context specific regulatory modules from gene expression data. First, CONFIGURE takes gene expression data and context label information as inputs and constructs regulatory modules. Then, CONFIGURE makes a regulatory module enrichment score (RMES) matrix of enrichment scores of the regulatory modules on samples using the single-sample GSEA method. CONFIGURE calculates the importance scores of the regulatory modules on each context to rank the regulatory modules. We evaluated CONFIGURE on the Cancer Genome Atlas (TCGA) breast cancer RNA-seq dataset to determine whether it can produce biologically meaningful regulatory modules for breast cancer subtypes. We first evaluated whether RMESs are useful for differentiating breast cancer subtypes using a multi-class classifier and one-vs-rest binary SVM classifiers. The multi-class and one-vs-rest binary classifiers were trained using the RMESs as features and outperformed baseline classifiers. Furthermore, we conducted literature surveys on the basal-like type specific regulatory modules obtained by CONFIGURE and showed that highly ranked modules were associated with the phenotypes of basal-like type breast cancers. CONCLUSIONS: We showed that enrichment scores of regulatory modules are useful for differentiating breast cancer subtypes and validated the basal-like type specific regulatory modules by literature surveys. In doing so, we found regulatory module candidates that have not been reported in previous literature. This demonstrates that CONFIGURE can be used to predict novel regulatory markers which can be validated by downstream wet lab experiments. We validated CONFIGURE on the breast cancer RNA-seq dataset in this work but CONFIGURE can be applied to any gene expression dataset containing context information.
format	Online Article Text
id	pubmed-6624175
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66241752019-07-23 CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer Park, Sungjoon Hwang, Doyeong Yeo, Yoon Sun Kim, Hyunggee Kang, Jaewoo BMC Med Genomics Research BACKGROUND: Gene expression data is widely used for identifying subtypes of diseases such as cancer. Differentially expressed gene analysis and gene set enrichment analysis are widely used for identifying biological mechanisms at the gene level and gene set level, respectively. However, the results of differentially expressed gene analysis are difficult to interpret and gene set enrichment analysis does not consider the interactions among genes in a gene set. RESULTS: We present CONFIGURE, a pipeline that identifies context specific regulatory modules from gene expression data. First, CONFIGURE takes gene expression data and context label information as inputs and constructs regulatory modules. Then, CONFIGURE makes a regulatory module enrichment score (RMES) matrix of enrichment scores of the regulatory modules on samples using the single-sample GSEA method. CONFIGURE calculates the importance scores of the regulatory modules on each context to rank the regulatory modules. We evaluated CONFIGURE on the Cancer Genome Atlas (TCGA) breast cancer RNA-seq dataset to determine whether it can produce biologically meaningful regulatory modules for breast cancer subtypes. We first evaluated whether RMESs are useful for differentiating breast cancer subtypes using a multi-class classifier and one-vs-rest binary SVM classifiers. The multi-class and one-vs-rest binary classifiers were trained using the RMESs as features and outperformed baseline classifiers. Furthermore, we conducted literature surveys on the basal-like type specific regulatory modules obtained by CONFIGURE and showed that highly ranked modules were associated with the phenotypes of basal-like type breast cancers. CONCLUSIONS: We showed that enrichment scores of regulatory modules are useful for differentiating breast cancer subtypes and validated the basal-like type specific regulatory modules by literature surveys. In doing so, we found regulatory module candidates that have not been reported in previous literature. This demonstrates that CONFIGURE can be used to predict novel regulatory markers which can be validated by downstream wet lab experiments. We validated CONFIGURE on the breast cancer RNA-seq dataset in this work but CONFIGURE can be applied to any gene expression dataset containing context information. BioMed Central 2019-07-11 /pmc/articles/PMC6624175/ /pubmed/31296219 http://dx.doi.org/10.1186/s12920-019-0515-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Park, Sungjoon Hwang, Doyeong Yeo, Yoon Sun Kim, Hyunggee Kang, Jaewoo CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
title	CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
title_full	CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
title_fullStr	CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
title_full_unstemmed	CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
title_short	CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
title_sort	configure: a pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624175/ https://www.ncbi.nlm.nih.gov/pubmed/31296219 http://dx.doi.org/10.1186/s12920-019-0515-6
work_keys_str_mv	AT parksungjoon configureapipelineforidentifyingcontextspecificregulatorymodulesfromgeneexpressiondataanditsapplicationtobreastcancer AT hwangdoyeong configureapipelineforidentifyingcontextspecificregulatorymodulesfromgeneexpressiondataanditsapplicationtobreastcancer AT yeoyoonsun configureapipelineforidentifyingcontextspecificregulatorymodulesfromgeneexpressiondataanditsapplicationtobreastcancer AT kimhyunggee configureapipelineforidentifyingcontextspecificregulatorymodulesfromgeneexpressiondataanditsapplicationtobreastcancer AT kangjaewoo configureapipelineforidentifyingcontextspecificregulatorymodulesfromgeneexpressiondataanditsapplicationtobreastcancer

CONFIGURE: A pipeline for identifying context specific regulatory modules from gene expression data and its application to breast cancer

Ejemplares similares