Cargando…

A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study

BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power t...

Descripción completa

Detalles Bibliográficos
Autores principales: Puthiyedth, Nisha, Riveros, Carlos, Berretta, Regina, Moscato, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4480358/
https://www.ncbi.nlm.nih.gov/pubmed/26106884
http://dx.doi.org/10.1371/journal.pone.0127702
_version_ 1782378148681220096
author Puthiyedth, Nisha
Riveros, Carlos
Berretta, Regina
Moscato, Pablo
author_facet Puthiyedth, Nisha
Riveros, Carlos
Berretta, Regina
Moscato, Pablo
author_sort Puthiyedth, Nisha
collection PubMed
description BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. METHODS: We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. RESULTS: Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.
format Online
Article
Text
id pubmed-4480358
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44803582015-06-29 A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study Puthiyedth, Nisha Riveros, Carlos Berretta, Regina Moscato, Pablo PLoS One Research Article BACKGROUND: The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. METHODS: We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. RESULTS: Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. Public Library of Science 2015-06-24 /pmc/articles/PMC4480358/ /pubmed/26106884 http://dx.doi.org/10.1371/journal.pone.0127702 Text en © 2015 Puthiyedth et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Puthiyedth, Nisha
Riveros, Carlos
Berretta, Regina
Moscato, Pablo
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
title A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
title_full A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
title_fullStr A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
title_full_unstemmed A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
title_short A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study
title_sort new combinatorial optimization approach for integrated feature selection using different datasets: a prostate cancer transcriptomic study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4480358/
https://www.ncbi.nlm.nih.gov/pubmed/26106884
http://dx.doi.org/10.1371/journal.pone.0127702
work_keys_str_mv AT puthiyedthnisha anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT riveroscarlos anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT berrettaregina anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT moscatopablo anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT puthiyedthnisha newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT riveroscarlos newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT berrettaregina newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT moscatopablo newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy