Cargando…

Increasing Consistency of Disease Biomarker Prediction Across Datasets

Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors suc...

Descripción completa

Detalles Bibliográficos
Autores principales: Chikina, Maria D., Sealfon, Stuart C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989170/
https://www.ncbi.nlm.nih.gov/pubmed/24740471
http://dx.doi.org/10.1371/journal.pone.0091272
_version_ 1782312119401709568
author Chikina, Maria D.
Sealfon, Stuart C.
author_facet Chikina, Maria D.
Sealfon, Stuart C.
author_sort Chikina, Maria D.
collection PubMed
description Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such ‘latent variables’ (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.
format Online
Article
Text
id pubmed-3989170
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39891702014-04-21 Increasing Consistency of Disease Biomarker Prediction Across Datasets Chikina, Maria D. Sealfon, Stuart C. PLoS One Research Article Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such ‘latent variables’ (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern. Public Library of Science 2014-04-16 /pmc/articles/PMC3989170/ /pubmed/24740471 http://dx.doi.org/10.1371/journal.pone.0091272 Text en © 2014 Chikina, Sealfon http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chikina, Maria D.
Sealfon, Stuart C.
Increasing Consistency of Disease Biomarker Prediction Across Datasets
title Increasing Consistency of Disease Biomarker Prediction Across Datasets
title_full Increasing Consistency of Disease Biomarker Prediction Across Datasets
title_fullStr Increasing Consistency of Disease Biomarker Prediction Across Datasets
title_full_unstemmed Increasing Consistency of Disease Biomarker Prediction Across Datasets
title_short Increasing Consistency of Disease Biomarker Prediction Across Datasets
title_sort increasing consistency of disease biomarker prediction across datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989170/
https://www.ncbi.nlm.nih.gov/pubmed/24740471
http://dx.doi.org/10.1371/journal.pone.0091272
work_keys_str_mv AT chikinamariad increasingconsistencyofdiseasebiomarkerpredictionacrossdatasets
AT sealfonstuartc increasingconsistencyofdiseasebiomarkerpredictionacrossdatasets