Cargando…

Indirect two-sided relative ranking: a robust similarity measure for gene expression data

BACKGROUND: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth...

Descripción completa

Detalles Bibliográficos
Autores principales: Licamele, Louis, Getoor, Lise
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851605/
https://www.ncbi.nlm.nih.gov/pubmed/20236517
http://dx.doi.org/10.1186/1471-2105-11-137
_version_ 1782179882068869120
author Licamele, Louis
Getoor, Lise
author_facet Licamele, Louis
Getoor, Lise
author_sort Licamele, Louis
collection PubMed
description BACKGROUND: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. RESULTS: In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. CONCLUSIONS: We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.
format Text
id pubmed-2851605
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28516052010-04-09 Indirect two-sided relative ranking: a robust similarity measure for gene expression data Licamele, Louis Getoor, Lise BMC Bioinformatics Methodology article BACKGROUND: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. RESULTS: In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. CONCLUSIONS: We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant. BioMed Central 2010-03-17 /pmc/articles/PMC2851605/ /pubmed/20236517 http://dx.doi.org/10.1186/1471-2105-11-137 Text en Copyright ©2010 Licamele and Getoor; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Licamele, Louis
Getoor, Lise
Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_full Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_fullStr Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_full_unstemmed Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_short Indirect two-sided relative ranking: a robust similarity measure for gene expression data
title_sort indirect two-sided relative ranking: a robust similarity measure for gene expression data
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851605/
https://www.ncbi.nlm.nih.gov/pubmed/20236517
http://dx.doi.org/10.1186/1471-2105-11-137
work_keys_str_mv AT licamelelouis indirecttwosidedrelativerankingarobustsimilaritymeasureforgeneexpressiondata
AT getoorlise indirecttwosidedrelativerankingarobustsimilaritymeasureforgeneexpressiondata