Cargando…
Increasing Consistency of Disease Biomarker Prediction Across Datasets
Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors suc...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989170/ https://www.ncbi.nlm.nih.gov/pubmed/24740471 http://dx.doi.org/10.1371/journal.pone.0091272 |
_version_ | 1782312119401709568 |
---|---|
author | Chikina, Maria D. Sealfon, Stuart C. |
author_facet | Chikina, Maria D. Sealfon, Stuart C. |
author_sort | Chikina, Maria D. |
collection | PubMed |
description | Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such ‘latent variables’ (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern. |
format | Online Article Text |
id | pubmed-3989170 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-39891702014-04-21 Increasing Consistency of Disease Biomarker Prediction Across Datasets Chikina, Maria D. Sealfon, Stuart C. PLoS One Research Article Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such ‘latent variables’ (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern. Public Library of Science 2014-04-16 /pmc/articles/PMC3989170/ /pubmed/24740471 http://dx.doi.org/10.1371/journal.pone.0091272 Text en © 2014 Chikina, Sealfon http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Chikina, Maria D. Sealfon, Stuart C. Increasing Consistency of Disease Biomarker Prediction Across Datasets |
title | Increasing Consistency of Disease Biomarker Prediction Across Datasets |
title_full | Increasing Consistency of Disease Biomarker Prediction Across Datasets |
title_fullStr | Increasing Consistency of Disease Biomarker Prediction Across Datasets |
title_full_unstemmed | Increasing Consistency of Disease Biomarker Prediction Across Datasets |
title_short | Increasing Consistency of Disease Biomarker Prediction Across Datasets |
title_sort | increasing consistency of disease biomarker prediction across datasets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989170/ https://www.ncbi.nlm.nih.gov/pubmed/24740471 http://dx.doi.org/10.1371/journal.pone.0091272 |
work_keys_str_mv | AT chikinamariad increasingconsistencyofdiseasebiomarkerpredictionacrossdatasets AT sealfonstuartc increasingconsistencyofdiseasebiomarkerpredictionacrossdatasets |