Cargando…

Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputabl...

Descripción completa

Detalles Bibliográficos
Autores principales: Di Camillo, Barbara, Sanavia, Tiziana, Martini, Matteo, Jurman, Giuseppe, Sambo, Francesco, Barla, Annalisa, Squillario, Margherita, Furlanello, Cesare, Toffolo, Gianna, Cobelli, Claudio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3293892/
https://www.ncbi.nlm.nih.gov/pubmed/22403633
http://dx.doi.org/10.1371/journal.pone.0032200
_version_ 1782225453091651584
author Di Camillo, Barbara
Sanavia, Tiziana
Martini, Matteo
Jurman, Giuseppe
Sambo, Francesco
Barla, Annalisa
Squillario, Margherita
Furlanello, Cesare
Toffolo, Gianna
Cobelli, Claudio
author_facet Di Camillo, Barbara
Sanavia, Tiziana
Martini, Matteo
Jurman, Giuseppe
Sambo, Francesco
Barla, Annalisa
Squillario, Margherita
Furlanello, Cesare
Toffolo, Gianna
Cobelli, Claudio
author_sort Di Camillo, Barbara
collection PubMed
description MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results.
format Online
Article
Text
id pubmed-3293892
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32938922012-03-08 Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment Di Camillo, Barbara Sanavia, Tiziana Martini, Matteo Jurman, Giuseppe Sambo, Francesco Barla, Annalisa Squillario, Margherita Furlanello, Cesare Toffolo, Gianna Cobelli, Claudio PLoS One Research Article MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results. Public Library of Science 2012-03-05 /pmc/articles/PMC3293892/ /pubmed/22403633 http://dx.doi.org/10.1371/journal.pone.0032200 Text en Di Camillo et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Di Camillo, Barbara
Sanavia, Tiziana
Martini, Matteo
Jurman, Giuseppe
Sambo, Francesco
Barla, Annalisa
Squillario, Margherita
Furlanello, Cesare
Toffolo, Gianna
Cobelli, Claudio
Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
title Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
title_full Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
title_fullStr Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
title_full_unstemmed Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
title_short Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
title_sort effect of size and heterogeneity of samples on biomarker discovery: synthetic and real data assessment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3293892/
https://www.ncbi.nlm.nih.gov/pubmed/22403633
http://dx.doi.org/10.1371/journal.pone.0032200
work_keys_str_mv AT dicamillobarbara effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT sanaviatiziana effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT martinimatteo effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT jurmangiuseppe effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT sambofrancesco effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT barlaannalisa effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT squillariomargherita effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT furlanellocesare effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT toffologianna effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment
AT cobelliclaudio effectofsizeandheterogeneityofsamplesonbiomarkerdiscoverysyntheticandrealdataassessment