Cargando…

Distance-based classifiers as potential diagnostic and prediction tools for human diseases

Typically, gene expression biomarkers are being discovered in course of high-throughput experiments, for example, RNAseq or microarray profiling. Analytic pipelines that extract so-called signatures suffer from the "Dimensionality curse": the number of genes expressed exceeds the number of...

Descripción completa

Detalles Bibliográficos
Autores principales: Veytsman, Boris, Wang, Lei, Cui, Tiange, Bruskin, Sergey, Baranova, Ancha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4303935/
https://www.ncbi.nlm.nih.gov/pubmed/25563076
http://dx.doi.org/10.1186/1471-2164-15-S12-S10
_version_ 1782354002151735296
author Veytsman, Boris
Wang, Lei
Cui, Tiange
Bruskin, Sergey
Baranova, Ancha
author_facet Veytsman, Boris
Wang, Lei
Cui, Tiange
Bruskin, Sergey
Baranova, Ancha
author_sort Veytsman, Boris
collection PubMed
description Typically, gene expression biomarkers are being discovered in course of high-throughput experiments, for example, RNAseq or microarray profiling. Analytic pipelines that extract so-called signatures suffer from the "Dimensionality curse": the number of genes expressed exceeds the number of patients we can enroll in the study and use to train the discriminator algorithm. Hence, problems with the reproducibility of gene signatures are more common than not; when the algorithm is executed using a different training set, the resulting diagnostic signature may turn out to be completely different. In this paper we propose an alternative novel approach which takes into account quantifiable expression levels of all genes assayed. In our analysis, the cumulative gene expression pattern of an individual patient is represented as a point in the multidimensional space formed by all gene expression profiles assayed in given system, where the clusters of "normal samples" and "affected samples" and defined. The degree of separation of the given sample from the space occupied by "normal samples" reflects the drift of the sample away from homeostasis in the course of development of the pathophysiological process that underly the disease. The outlined approach was validated using the publicly available glioma dataset deposited in Rembrandt and associated with survival data. Additionally, the applicability of the distance analysis to the classification of non-malignant sampled was tested using psoriatic lesions and non-lesional matched controls as a model. Keywords: biomarkers; clustering; human diseases; RNA
format Online
Article
Text
id pubmed-4303935
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43039352015-02-09 Distance-based classifiers as potential diagnostic and prediction tools for human diseases Veytsman, Boris Wang, Lei Cui, Tiange Bruskin, Sergey Baranova, Ancha BMC Genomics Research Typically, gene expression biomarkers are being discovered in course of high-throughput experiments, for example, RNAseq or microarray profiling. Analytic pipelines that extract so-called signatures suffer from the "Dimensionality curse": the number of genes expressed exceeds the number of patients we can enroll in the study and use to train the discriminator algorithm. Hence, problems with the reproducibility of gene signatures are more common than not; when the algorithm is executed using a different training set, the resulting diagnostic signature may turn out to be completely different. In this paper we propose an alternative novel approach which takes into account quantifiable expression levels of all genes assayed. In our analysis, the cumulative gene expression pattern of an individual patient is represented as a point in the multidimensional space formed by all gene expression profiles assayed in given system, where the clusters of "normal samples" and "affected samples" and defined. The degree of separation of the given sample from the space occupied by "normal samples" reflects the drift of the sample away from homeostasis in the course of development of the pathophysiological process that underly the disease. The outlined approach was validated using the publicly available glioma dataset deposited in Rembrandt and associated with survival data. Additionally, the applicability of the distance analysis to the classification of non-malignant sampled was tested using psoriatic lesions and non-lesional matched controls as a model. Keywords: biomarkers; clustering; human diseases; RNA BioMed Central 2014-12-19 /pmc/articles/PMC4303935/ /pubmed/25563076 http://dx.doi.org/10.1186/1471-2164-15-S12-S10 Text en Copyright © 2014 Veytsman et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Veytsman, Boris
Wang, Lei
Cui, Tiange
Bruskin, Sergey
Baranova, Ancha
Distance-based classifiers as potential diagnostic and prediction tools for human diseases
title Distance-based classifiers as potential diagnostic and prediction tools for human diseases
title_full Distance-based classifiers as potential diagnostic and prediction tools for human diseases
title_fullStr Distance-based classifiers as potential diagnostic and prediction tools for human diseases
title_full_unstemmed Distance-based classifiers as potential diagnostic and prediction tools for human diseases
title_short Distance-based classifiers as potential diagnostic and prediction tools for human diseases
title_sort distance-based classifiers as potential diagnostic and prediction tools for human diseases
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4303935/
https://www.ncbi.nlm.nih.gov/pubmed/25563076
http://dx.doi.org/10.1186/1471-2164-15-S12-S10
work_keys_str_mv AT veytsmanboris distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases
AT wanglei distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases
AT cuitiange distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases
AT bruskinsergey distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases
AT baranovaancha distancebasedclassifiersaspotentialdiagnosticandpredictiontoolsforhumandiseases