Cargando…
Indirect two-sided relative ranking: a robust similarity measure for gene expression data
BACKGROUND: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851605/ https://www.ncbi.nlm.nih.gov/pubmed/20236517 http://dx.doi.org/10.1186/1471-2105-11-137 |
_version_ | 1782179882068869120 |
---|---|
author | Licamele, Louis Getoor, Lise |
author_facet | Licamele, Louis Getoor, Lise |
author_sort | Licamele, Louis |
collection | PubMed |
description | BACKGROUND: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. RESULTS: In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. CONCLUSIONS: We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant. |
format | Text |
id | pubmed-2851605 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28516052010-04-09 Indirect two-sided relative ranking: a robust similarity measure for gene expression data Licamele, Louis Getoor, Lise BMC Bioinformatics Methodology article BACKGROUND: There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. RESULTS: In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. CONCLUSIONS: We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public). We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related) cell types. Even in the absence of a known (i.e., labeled) experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant. BioMed Central 2010-03-17 /pmc/articles/PMC2851605/ /pubmed/20236517 http://dx.doi.org/10.1186/1471-2105-11-137 Text en Copyright ©2010 Licamele and Getoor; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology article Licamele, Louis Getoor, Lise Indirect two-sided relative ranking: a robust similarity measure for gene expression data |
title | Indirect two-sided relative ranking: a robust similarity measure for gene expression data |
title_full | Indirect two-sided relative ranking: a robust similarity measure for gene expression data |
title_fullStr | Indirect two-sided relative ranking: a robust similarity measure for gene expression data |
title_full_unstemmed | Indirect two-sided relative ranking: a robust similarity measure for gene expression data |
title_short | Indirect two-sided relative ranking: a robust similarity measure for gene expression data |
title_sort | indirect two-sided relative ranking: a robust similarity measure for gene expression data |
topic | Methodology article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851605/ https://www.ncbi.nlm.nih.gov/pubmed/20236517 http://dx.doi.org/10.1186/1471-2105-11-137 |
work_keys_str_mv | AT licamelelouis indirecttwosidedrelativerankingarobustsimilaritymeasureforgeneexpressiondata AT getoorlise indirecttwosidedrelativerankingarobustsimilaritymeasureforgeneexpressiondata |