Cargando…

A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations

BACKGROUND: This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SS...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Hongtai, Fava, Andrea, Guhr, Tara, Cimbro, Raffaello, Rosen, Antony, Boin, Francesco, Ellis, Hugh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4571079/
https://www.ncbi.nlm.nih.gov/pubmed/26373409
http://dx.doi.org/10.1186/s12859-015-0722-x
_version_ 1782390296679546880
author Huang, Hongtai
Fava, Andrea
Guhr, Tara
Cimbro, Raffaello
Rosen, Antony
Boin, Francesco
Ellis, Hugh
author_facet Huang, Hongtai
Fava, Andrea
Guhr, Tara
Cimbro, Raffaello
Rosen, Antony
Boin, Francesco
Ellis, Hugh
author_sort Huang, Hongtai
collection PubMed
description BACKGROUND: This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SSc, or scleroderma) clinical phenotype which is the leading cause of morbidity and mortality in SSc. A specific aim of the work involves developing a clinically useful screening tool that could yield accurate assessments of disease state such as the risk or presence of SSc-ILD, the activity of lung involvement and the likelihood to respond to therapeutic intervention. Ultimately this instrument could facilitate a refined stratification of SSc patients into clinically relevant subsets at the time of diagnosis and subsequently during the course of the disease and thus help in preventing bad outcomes from disease progression or unnecessary treatment side effects. The methods utilized in the work involve: (1) clinical and peripheral blood flow cytometry data (Immune Response In Scleroderma, IRIS) from consented patients followed at the Johns Hopkins Scleroderma Center. (2) machine learning (Conditional Random Forests - CRF) coupled with Gene Set Enrichment Analysis (GSEA) to identify subsets of FC variables that are highly effective in classifying ILD patients; and (3) stochastic simulation to design, train and validate ILD risk screening tools. RESULTS: Our hybrid analysis approach (CRF-GSEA) proved successful in predicting SSc patient ILD status with a high degree of success (>82 % correct classification in validation; 79 patients in the training data set, 40 patients in the validation data set). CONCLUSIONS: IRIS flow cytometry data provides useful information in assessing the ILD status of SSc patients. Our new approach combining Conditional Random Forests and Gene Set Enrichment Analysis was successful in identifying a subset of flow cytometry variables to create a screening tool that proved effective in correctly identifying ILD patients in the training and validation data sets. From a somewhat broader perspective, the identification of subsets of flow cytometry variables that exhibit coordinated movement (i.e., multi-variable up or down regulation) may lead to insights into possible effector pathways and thereby improve the state of knowledge of systemic sclerosis pathogenesis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0722-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4571079
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45710792015-09-17 A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations Huang, Hongtai Fava, Andrea Guhr, Tara Cimbro, Raffaello Rosen, Antony Boin, Francesco Ellis, Hugh BMC Bioinformatics Methodology Article BACKGROUND: This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SSc, or scleroderma) clinical phenotype which is the leading cause of morbidity and mortality in SSc. A specific aim of the work involves developing a clinically useful screening tool that could yield accurate assessments of disease state such as the risk or presence of SSc-ILD, the activity of lung involvement and the likelihood to respond to therapeutic intervention. Ultimately this instrument could facilitate a refined stratification of SSc patients into clinically relevant subsets at the time of diagnosis and subsequently during the course of the disease and thus help in preventing bad outcomes from disease progression or unnecessary treatment side effects. The methods utilized in the work involve: (1) clinical and peripheral blood flow cytometry data (Immune Response In Scleroderma, IRIS) from consented patients followed at the Johns Hopkins Scleroderma Center. (2) machine learning (Conditional Random Forests - CRF) coupled with Gene Set Enrichment Analysis (GSEA) to identify subsets of FC variables that are highly effective in classifying ILD patients; and (3) stochastic simulation to design, train and validate ILD risk screening tools. RESULTS: Our hybrid analysis approach (CRF-GSEA) proved successful in predicting SSc patient ILD status with a high degree of success (>82 % correct classification in validation; 79 patients in the training data set, 40 patients in the validation data set). CONCLUSIONS: IRIS flow cytometry data provides useful information in assessing the ILD status of SSc patients. Our new approach combining Conditional Random Forests and Gene Set Enrichment Analysis was successful in identifying a subset of flow cytometry variables to create a screening tool that proved effective in correctly identifying ILD patients in the training and validation data sets. From a somewhat broader perspective, the identification of subsets of flow cytometry variables that exhibit coordinated movement (i.e., multi-variable up or down regulation) may lead to insights into possible effector pathways and thereby improve the state of knowledge of systemic sclerosis pathogenesis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0722-x) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-15 /pmc/articles/PMC4571079/ /pubmed/26373409 http://dx.doi.org/10.1186/s12859-015-0722-x Text en © Huang et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Huang, Hongtai
Fava, Andrea
Guhr, Tara
Cimbro, Raffaello
Rosen, Antony
Boin, Francesco
Ellis, Hugh
A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
title A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
title_full A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
title_fullStr A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
title_full_unstemmed A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
title_short A methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
title_sort methodology for exploring biomarker – phenotype associations: application to flow cytometry data and systemic sclerosis clinical manifestations
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4571079/
https://www.ncbi.nlm.nih.gov/pubmed/26373409
http://dx.doi.org/10.1186/s12859-015-0722-x
work_keys_str_mv AT huanghongtai amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT favaandrea amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT guhrtara amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT cimbroraffaello amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT rosenantony amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT boinfrancesco amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT ellishugh amethodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT huanghongtai methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT favaandrea methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT guhrtara methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT cimbroraffaello methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT rosenantony methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT boinfrancesco methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations
AT ellishugh methodologyforexploringbiomarkerphenotypeassociationsapplicationtoflowcytometrydataandsystemicsclerosisclinicalmanifestations