Cargando…

The characteristic direction: a geometrical approach to identify differentially expressed genes

BACKGROUND: Identifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling. Typically, DEG are identified by univariate approaches such as Significance Analysis of Microarrays (SAM) or Linear Models for Microarray Data (LIMMA) for proc...

Descripción completa

Detalles Bibliográficos
Autores principales: Clark, Neil R, Hu, Kevin S, Feldmann, Axel S, Kou, Yan, Chen, Edward Y, Duan, Qiaonan, Ma’ayan, Avi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000056/
https://www.ncbi.nlm.nih.gov/pubmed/24650281
http://dx.doi.org/10.1186/1471-2105-15-79
_version_ 1782313572860166144
author Clark, Neil R
Hu, Kevin S
Feldmann, Axel S
Kou, Yan
Chen, Edward Y
Duan, Qiaonan
Ma’ayan, Avi
author_facet Clark, Neil R
Hu, Kevin S
Feldmann, Axel S
Kou, Yan
Chen, Edward Y
Duan, Qiaonan
Ma’ayan, Avi
author_sort Clark, Neil R
collection PubMed
description BACKGROUND: Identifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling. Typically, DEG are identified by univariate approaches such as Significance Analysis of Microarrays (SAM) or Linear Models for Microarray Data (LIMMA) for processing cDNA microarrays, and differential gene expression analysis based on the negative binomial distribution (DESeq) or Empirical analysis of Digital Gene Expression data in R (edgeR) for RNA-seq profiling. RESULTS: Here we present a new geometrical multivariate approach to identify DEG called the Characteristic Direction. We demonstrate that the Characteristic Direction method is significantly more sensitive than existing methods for identifying DEG in the context of transcription factor (TF) and drug perturbation responses over a large number of microarray experiments. We also benchmarked the Characteristic Direction method using synthetic data, as well as RNA-Seq data. A large collection of microarray expression data from TF perturbations (73 experiments) and drug perturbations (130 experiments) extracted from the Gene Expression Omnibus (GEO), as well as an RNA-Seq study that profiled genome-wide gene expression and STAT3 DNA binding in two subtypes of diffuse large B-cell Lymphoma, were used for benchmarking the method using real data. ChIP-Seq data identifying DNA binding sites of the perturbed TFs, as well as known drug targets of the perturbing drugs, were used as prior knowledge silver-standard for validation. In all cases the Characteristic Direction DEG calling method outperformed other methods. We find that when drugs are applied to cells in various contexts, the proteins that interact with the drug-targets are differentially expressed and more of the corresponding genes are discovered by the Characteristic Direction method. In addition, we show that the Characteristic Direction conceptualization can be used to perform improved gene set enrichment analyses when compared with the gene-set enrichment analysis (GSEA) and the hypergeometric test. CONCLUSIONS: The application of the Characteristic Direction method may shed new light on relevant biological mechanisms that would have remained undiscovered by the current state-of-the-art DEG methods. The method is freely accessible via various open source code implementations using four popular programming languages: R, Python, MATLAB and Mathematica, all available at: http://www.maayanlab.net/CD.
format Online
Article
Text
id pubmed-4000056
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40000562014-05-08 The characteristic direction: a geometrical approach to identify differentially expressed genes Clark, Neil R Hu, Kevin S Feldmann, Axel S Kou, Yan Chen, Edward Y Duan, Qiaonan Ma’ayan, Avi BMC Bioinformatics Research Article BACKGROUND: Identifying differentially expressed genes (DEG) is a fundamental step in studies that perform genome wide expression profiling. Typically, DEG are identified by univariate approaches such as Significance Analysis of Microarrays (SAM) or Linear Models for Microarray Data (LIMMA) for processing cDNA microarrays, and differential gene expression analysis based on the negative binomial distribution (DESeq) or Empirical analysis of Digital Gene Expression data in R (edgeR) for RNA-seq profiling. RESULTS: Here we present a new geometrical multivariate approach to identify DEG called the Characteristic Direction. We demonstrate that the Characteristic Direction method is significantly more sensitive than existing methods for identifying DEG in the context of transcription factor (TF) and drug perturbation responses over a large number of microarray experiments. We also benchmarked the Characteristic Direction method using synthetic data, as well as RNA-Seq data. A large collection of microarray expression data from TF perturbations (73 experiments) and drug perturbations (130 experiments) extracted from the Gene Expression Omnibus (GEO), as well as an RNA-Seq study that profiled genome-wide gene expression and STAT3 DNA binding in two subtypes of diffuse large B-cell Lymphoma, were used for benchmarking the method using real data. ChIP-Seq data identifying DNA binding sites of the perturbed TFs, as well as known drug targets of the perturbing drugs, were used as prior knowledge silver-standard for validation. In all cases the Characteristic Direction DEG calling method outperformed other methods. We find that when drugs are applied to cells in various contexts, the proteins that interact with the drug-targets are differentially expressed and more of the corresponding genes are discovered by the Characteristic Direction method. In addition, we show that the Characteristic Direction conceptualization can be used to perform improved gene set enrichment analyses when compared with the gene-set enrichment analysis (GSEA) and the hypergeometric test. CONCLUSIONS: The application of the Characteristic Direction method may shed new light on relevant biological mechanisms that would have remained undiscovered by the current state-of-the-art DEG methods. The method is freely accessible via various open source code implementations using four popular programming languages: R, Python, MATLAB and Mathematica, all available at: http://www.maayanlab.net/CD. BioMed Central 2014-03-21 /pmc/articles/PMC4000056/ /pubmed/24650281 http://dx.doi.org/10.1186/1471-2105-15-79 Text en Copyright © 2014 Clark et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Clark, Neil R
Hu, Kevin S
Feldmann, Axel S
Kou, Yan
Chen, Edward Y
Duan, Qiaonan
Ma’ayan, Avi
The characteristic direction: a geometrical approach to identify differentially expressed genes
title The characteristic direction: a geometrical approach to identify differentially expressed genes
title_full The characteristic direction: a geometrical approach to identify differentially expressed genes
title_fullStr The characteristic direction: a geometrical approach to identify differentially expressed genes
title_full_unstemmed The characteristic direction: a geometrical approach to identify differentially expressed genes
title_short The characteristic direction: a geometrical approach to identify differentially expressed genes
title_sort characteristic direction: a geometrical approach to identify differentially expressed genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000056/
https://www.ncbi.nlm.nih.gov/pubmed/24650281
http://dx.doi.org/10.1186/1471-2105-15-79
work_keys_str_mv AT clarkneilr thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT hukevins thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT feldmannaxels thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT kouyan thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT chenedwardy thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT duanqiaonan thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT maayanavi thecharacteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT clarkneilr characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT hukevins characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT feldmannaxels characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT kouyan characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT chenedwardy characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT duanqiaonan characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes
AT maayanavi characteristicdirectionageometricalapproachtoidentifydifferentiallyexpressedgenes