Cargando…
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4480358/ https://www.ncbi.nlm.nih.gov/pubmed/26106884 http://dx.doi.org/10.1371/journal.pone.0127702 |
_version_ | 1782378148681220096 |
---|---|
author | Puthiyedth, Nisha Riveros, Carlos Berretta, Regina Moscato, Pablo |
author_facet | Puthiyedth, Nisha Riveros, Carlos Berretta, Regina Moscato, Pablo |
author_sort | Puthiyedth, Nisha |
collection | PubMed |
description | BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. METHODS: We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. RESULTS: Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. |
format | Online Article Text |
id | pubmed-4480358 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44803582015-06-29 A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study Puthiyedth, Nisha Riveros, Carlos Berretta, Regina Moscato, Pablo PLoS One Research Article BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. METHODS: We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. RESULTS: Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. Public Library of Science 2015-06-24 /pmc/articles/PMC4480358/ /pubmed/26106884 http://dx.doi.org/10.1371/journal.pone.0127702 Text en © 2015 Puthiyedth et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Puthiyedth, Nisha Riveros, Carlos Berretta, Regina Moscato, Pablo A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study |
title | A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study |
title_full | A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study |
title_fullStr | A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study |
title_full_unstemmed | A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study |
title_short | A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study |
title_sort | new combinatorial optimization approach for integrated feature selection using different datasets: a prostate cancer transcriptomic study |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4480358/ https://www.ncbi.nlm.nih.gov/pubmed/26106884 http://dx.doi.org/10.1371/journal.pone.0127702 |
work_keys_str_mv | AT puthiyedthnisha anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT riveroscarlos anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT berrettaregina anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT moscatopablo anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT puthiyedthnisha newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT riveroscarlos newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT berrettaregina newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT moscatopablo newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy |