Cargando…
Distance-based classifiers as potential diagnostic and prediction tools for human diseases
Typically, gene expression biomarkers are being discovered in course of high-throughput experiments, for example, RNAseq or microarray profiling. Analytic pipelines that extract so-called signatures suffer from the "Dimensionality curse": the number of genes expressed exceeds the number of...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4303935/ https://www.ncbi.nlm.nih.gov/pubmed/25563076 http://dx.doi.org/10.1186/1471-2164-15-S12-S10 |
_version_ | 1782354002151735296 |
---|---|
author | Veytsman, Boris Wang, Lei Cui, Tiange Bruskin, Sergey Baranova, Ancha |
author_facet | Veytsman, Boris Wang, Lei Cui, Tiange Bruskin, Sergey Baranova, Ancha |
author_sort | Veytsman, Boris |
collection | PubMed |
description | Typically, gene expression biomarkers are being discovered in course of high-throughput experiments, for example, RNAseq or microarray profiling. Analytic pipelines that extract so-called signatures suffer from the "Dimensionality curse": the number of genes expressed exceeds the number of patients we can enroll in the study and use to train the discriminator algorithm. Hence, problems with the reproducibility of gene signatures are more common than not; when the algorithm is executed using a different training set, the resulting diagnostic signature may turn out to be completely different. In this paper we propose an alternative novel approach which takes into account quantifiable expression levels of all genes assayed. In our analysis, the cumulative gene expression pattern of an individual patient is represented as a point in the multidimensional space formed by all gene expression profiles assayed in given system, where the clusters of "normal samples" and "affected samples" and defined. The degree of separation of the given sample from the space occupied by "normal samples" reflects the drift of the sample away from homeostasis in the course of development of the pathophysiological process that underly the disease. The outlined approach was validated using the publicly available glioma dataset deposited in Rembrandt and associated with survival data. Additionally, the applicability of the distance analysis to the classification of non-malignant sampled was tested using psoriatic lesions and non-lesional matched controls as a model. Keywords: biomarkers; clustering; human diseases; RNA |
format | Online Article Text |
id | pubmed-4303935 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43039352015-02-09 Distance-based classifiers as potential diagnostic and prediction tools for human diseases Veytsman, Boris Wang, Lei Cui, Tiange Bruskin, Sergey Baranova, Ancha BMC Genomics Research Typically, gene expression biomarkers are being discovered in course of high-throughput experiments, for example, RNAseq or microarray profiling. Analytic pipelines that extract so-called signatures suffer from the "Dimensionality curse": the number of genes expressed exceeds the number of patients we can enroll in the study and use to train the discriminator algorithm. Hence, problems with the reproducibility of gene signatures are more common than not; when the algorithm is executed using a different training set, the resulting diagnostic signature may turn out to be completely different. In this paper we propose an alternative novel approach which takes into account quantifiable expression levels of all genes assayed. In our analysis, the cumulative gene expression pattern of an individual patient is represented as a point in the multidimensional space formed by all gene expression profiles assayed in given system, where the clusters of "normal samples" and "affected samples" and defined. The degree of separation of the given sample from the space occupied by "normal samples" reflects the drift of the sample away from homeostasis in the course of development of the pathophysiological process that underly the disease. The outlined approach was validated using the publicly available glioma dataset deposited in Rembrandt and associated with survival data. Additionally, the applicability of the distance analysis to the classification of non-malignant sampled was tested using psoriatic lesions and non-lesional matched controls as a model. Keywords: biomarkers; clustering; human diseases; RNA BioMed Central 2014-12-19 /pmc/articles/PMC4303935/ /pubmed/25563076 http://dx.doi.org/10.1186/1471-2164-15-S12-S10 Text en Copyright © 2014 Veytsman et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Veytsman, Boris Wang, Lei Cui, Tiange Bruskin, Sergey Baranova, Ancha Distance-based classifiers as potential diagnostic and prediction tools for human diseases |
title | Distance-based classifiers as potential diagnostic and prediction tools for human diseases |
title_full | Distance-based classifiers as potential diagnostic and prediction tools for human diseases |
title_fullStr | Distance-based classifiers as potential diagnostic and prediction tools for human diseases |
title_full_unstemmed | Distance-based classifiers as potential diagnostic and prediction tools for human diseases |
title_short | Distance-based classifiers as potential diagnostic and prediction tools for human diseases |
title_sort | distance-based classifiers as potential diagnostic and prediction tools for human diseases |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4303935/ https://www.ncbi.nlm.nih.gov/pubmed/25563076 http://dx.doi.org/10.1186/1471-2164-15-S12-S10 |
work_keys_str_mv | AT veytsmanboris distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases AT wanglei distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases AT cuitiange distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases AT bruskinsergey distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases AT baranovaancha distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases |