Cargando…

Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning

BACKGROUND: The Connectivity Map (CMAP) database, an important public data source for drug repositioning, archives gene expression profiles from cancer cell lines treated with and without bioactive small molecules. However, there are only one or two technical replicates for each cell line under one...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Jun, Yan, Haidan, Cai, Hao, Li, Xiangyu, Guan, Qingzhou, Zheng, Weicheng, Chen, Rou, Liu, Huaping, Song, Kai, Guo, Zheng, Wang, Xianlong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5622488/
https://www.ncbi.nlm.nih.gov/pubmed/28962576
http://dx.doi.org/10.1186/s12967-017-1302-9
_version_ 1783267919214411776
author He, Jun
Yan, Haidan
Cai, Hao
Li, Xiangyu
Guan, Qingzhou
Zheng, Weicheng
Chen, Rou
Liu, Huaping
Song, Kai
Guo, Zheng
Wang, Xianlong
author_facet He, Jun
Yan, Haidan
Cai, Hao
Li, Xiangyu
Guan, Qingzhou
Zheng, Weicheng
Chen, Rou
Liu, Huaping
Song, Kai
Guo, Zheng
Wang, Xianlong
author_sort He, Jun
collection PubMed
description BACKGROUND: The Connectivity Map (CMAP) database, an important public data source for drug repositioning, archives gene expression profiles from cancer cell lines treated with and without bioactive small molecules. However, there are only one or two technical replicates for each cell line under one treatment condition. For such small-scale data, current fold-changes-based methods lack statistical control in identifying differentially expressed genes (DEGs) in treated cells. Especially, one-to-one comparison may result in too many drug-irrelevant DEGs due to random experimental factors. To tackle this problem, CMAP adopts a pattern-matching strategy to build “connection” between disease signatures and gene expression changes associated with drug treatments. However, many drug-irrelevant genes may blur the “connection” if all the genes are used instead of pre-selected DEGs induced by drug treatments. METHODS: We applied OneComp, a customized version of RankComp, to identify DEGs in such small-scale cell line datasets. For a cell line, a list of gene pairs with stable relative expression orderings (REOs) were identified in a large collection of control cell samples measured in different experiments and they formed the background stable REOs. When applying OneComp to a small-scale cell line dataset, the background stable REOs were customized by filtering out the gene pairs with reversal REOs in the control samples of the analyzed dataset. RESULTS: In simulated data, the consistency scores of overlapping genes between DEGs identified by OneComp and SAM were all higher than 99%, while the consistency score of the DEGs solely identified by OneComp was 96.85% according to the observed expression difference method. The usefulness of OneComp was exemplified in drug repositioning by identifying phenformin and metformin related genes using small-scale cell line datasets which helped to support them as a potential anti-tumor drug for non-small-cell lung carcinoma, while the pattern-matching strategy adopted by CMAP missed the two connections. The implementation of OneComp is available at https://github.com/pathint/reoa. CONCLUSIONS: OneComp performed well in both the simulated and real data. It is useful in drug repositioning studies by helping to find hidden “connections” between drugs and diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12967-017-1302-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5622488
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56224882017-10-11 Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning He, Jun Yan, Haidan Cai, Hao Li, Xiangyu Guan, Qingzhou Zheng, Weicheng Chen, Rou Liu, Huaping Song, Kai Guo, Zheng Wang, Xianlong J Transl Med Research BACKGROUND: The Connectivity Map (CMAP) database, an important public data source for drug repositioning, archives gene expression profiles from cancer cell lines treated with and without bioactive small molecules. However, there are only one or two technical replicates for each cell line under one treatment condition. For such small-scale data, current fold-changes-based methods lack statistical control in identifying differentially expressed genes (DEGs) in treated cells. Especially, one-to-one comparison may result in too many drug-irrelevant DEGs due to random experimental factors. To tackle this problem, CMAP adopts a pattern-matching strategy to build “connection” between disease signatures and gene expression changes associated with drug treatments. However, many drug-irrelevant genes may blur the “connection” if all the genes are used instead of pre-selected DEGs induced by drug treatments. METHODS: We applied OneComp, a customized version of RankComp, to identify DEGs in such small-scale cell line datasets. For a cell line, a list of gene pairs with stable relative expression orderings (REOs) were identified in a large collection of control cell samples measured in different experiments and they formed the background stable REOs. When applying OneComp to a small-scale cell line dataset, the background stable REOs were customized by filtering out the gene pairs with reversal REOs in the control samples of the analyzed dataset. RESULTS: In simulated data, the consistency scores of overlapping genes between DEGs identified by OneComp and SAM were all higher than 99%, while the consistency score of the DEGs solely identified by OneComp was 96.85% according to the observed expression difference method. The usefulness of OneComp was exemplified in drug repositioning by identifying phenformin and metformin related genes using small-scale cell line datasets which helped to support them as a potential anti-tumor drug for non-small-cell lung carcinoma, while the pattern-matching strategy adopted by CMAP missed the two connections. The implementation of OneComp is available at https://github.com/pathint/reoa. CONCLUSIONS: OneComp performed well in both the simulated and real data. It is useful in drug repositioning studies by helping to find hidden “connections” between drugs and diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12967-017-1302-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-29 /pmc/articles/PMC5622488/ /pubmed/28962576 http://dx.doi.org/10.1186/s12967-017-1302-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
He, Jun
Yan, Haidan
Cai, Hao
Li, Xiangyu
Guan, Qingzhou
Zheng, Weicheng
Chen, Rou
Liu, Huaping
Song, Kai
Guo, Zheng
Wang, Xianlong
Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning
title Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning
title_full Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning
title_fullStr Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning
title_full_unstemmed Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning
title_short Statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the CMAP database for drug repositioning
title_sort statistically controlled identification of differentially expressed genes in one-to-one cell line comparisons of the cmap database for drug repositioning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5622488/
https://www.ncbi.nlm.nih.gov/pubmed/28962576
http://dx.doi.org/10.1186/s12967-017-1302-9
work_keys_str_mv AT hejun statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT yanhaidan statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT caihao statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT lixiangyu statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT guanqingzhou statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT zhengweicheng statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT chenrou statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT liuhuaping statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT songkai statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT guozheng statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning
AT wangxianlong statisticallycontrolledidentificationofdifferentiallyexpressedgenesinonetoonecelllinecomparisonsofthecmapdatabasefordrugrepositioning