Cargando…

Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia

BACKGROUND: Aggregating gene expression data across experiments via meta-analysis is expected to increase the precision of the effect estimates and to increase the statistical power to detect a certain fold change. This study evaluates the potential benefit of using a meta-analysis approach as a gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Novianti, Putri W., Jong, Victor L., Roes, Kit C. B., Eijkemans, Marinus J. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5387259/
https://www.ncbi.nlm.nih.gov/pubmed/28399794
http://dx.doi.org/10.1186/s12859-017-1619-7
_version_ 1782520909264846848
author Novianti, Putri W.
Jong, Victor L.
Roes, Kit C. B.
Eijkemans, Marinus J. C.
author_facet Novianti, Putri W.
Jong, Victor L.
Roes, Kit C. B.
Eijkemans, Marinus J. C.
author_sort Novianti, Putri W.
collection PubMed
description BACKGROUND: Aggregating gene expression data across experiments via meta-analysis is expected to increase the precision of the effect estimates and to increase the statistical power to detect a certain fold change. This study evaluates the potential benefit of using a meta-analysis approach as a gene selection method prior to predictive modeling in gene expression data. RESULTS: Six raw datasets from different gene expression experiments in acute myeloid leukemia (AML) and 11 different classification methods were used to build classification models to classify samples as either AML or healthy control. First, the classification models were trained on gene expression data from single experiments using conventional supervised variable selection and externally validated with the other five gene expression datasets (referred to as the individual-classification approach). Next, gene selection was performed through meta-analysis on four datasets, and predictive models were trained with the selected genes on the fifth dataset and validated on the sixth dataset. For some datasets, gene selection through meta-analysis helped classification models to achieve higher performance as compared to predictive modeling based on a single dataset; but for others, there was no major improvement. Synthetic datasets were generated from nine simulation scenarios. The effect of sample size, fold change and pairwise correlation between differentially expressed (DE) genes on the difference between MA- and individual-classification model was evaluated. The fold change and pairwise correlation significantly contributed to the difference in performance between the two methods. The gene selection via meta-analysis approach was more effective when it was conducted using a set of data with low fold change and high pairwise correlation on the DE genes. CONCLUSION: Gene selection through meta-analysis on previously published studies potentially improves the performance of a predictive model on a given gene expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1619-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5387259
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53872592017-04-11 Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia Novianti, Putri W. Jong, Victor L. Roes, Kit C. B. Eijkemans, Marinus J. C. BMC Bioinformatics Research Article BACKGROUND: Aggregating gene expression data across experiments via meta-analysis is expected to increase the precision of the effect estimates and to increase the statistical power to detect a certain fold change. This study evaluates the potential benefit of using a meta-analysis approach as a gene selection method prior to predictive modeling in gene expression data. RESULTS: Six raw datasets from different gene expression experiments in acute myeloid leukemia (AML) and 11 different classification methods were used to build classification models to classify samples as either AML or healthy control. First, the classification models were trained on gene expression data from single experiments using conventional supervised variable selection and externally validated with the other five gene expression datasets (referred to as the individual-classification approach). Next, gene selection was performed through meta-analysis on four datasets, and predictive models were trained with the selected genes on the fifth dataset and validated on the sixth dataset. For some datasets, gene selection through meta-analysis helped classification models to achieve higher performance as compared to predictive modeling based on a single dataset; but for others, there was no major improvement. Synthetic datasets were generated from nine simulation scenarios. The effect of sample size, fold change and pairwise correlation between differentially expressed (DE) genes on the difference between MA- and individual-classification model was evaluated. The fold change and pairwise correlation significantly contributed to the difference in performance between the two methods. The gene selection via meta-analysis approach was more effective when it was conducted using a set of data with low fold change and high pairwise correlation on the DE genes. CONCLUSION: Gene selection through meta-analysis on previously published studies potentially improves the performance of a predictive model on a given gene expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1619-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-11 /pmc/articles/PMC5387259/ /pubmed/28399794 http://dx.doi.org/10.1186/s12859-017-1619-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Novianti, Putri W.
Jong, Victor L.
Roes, Kit C. B.
Eijkemans, Marinus J. C.
Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia
title Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia
title_full Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia
title_fullStr Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia
title_full_unstemmed Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia
title_short Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia
title_sort meta-analysis approach as a gene selection method in class prediction: does it improve model performance? a case study in acute myeloid leukemia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5387259/
https://www.ncbi.nlm.nih.gov/pubmed/28399794
http://dx.doi.org/10.1186/s12859-017-1619-7
work_keys_str_mv AT noviantiputriw metaanalysisapproachasageneselectionmethodinclasspredictiondoesitimprovemodelperformanceacasestudyinacutemyeloidleukemia
AT jongvictorl metaanalysisapproachasageneselectionmethodinclasspredictiondoesitimprovemodelperformanceacasestudyinacutemyeloidleukemia
AT roeskitcb metaanalysisapproachasageneselectionmethodinclasspredictiondoesitimprovemodelperformanceacasestudyinacutemyeloidleukemia
AT eijkemansmarinusjc metaanalysisapproachasageneselectionmethodinclasspredictiondoesitimprovemodelperformanceacasestudyinacutemyeloidleukemia