Cargando…

biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data

High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within th...

Descripción completa

Detalles Bibliográficos
Autores principales: Rinaudo, Philippe, Boudah, Samia, Junot, Christophe, Thévenot, Etienne A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914951/
https://www.ncbi.nlm.nih.gov/pubmed/27446929
http://dx.doi.org/10.3389/fmolb.2016.00026
_version_ 1782438613498200064
author Rinaudo, Philippe
Boudah, Samia
Junot, Christophe
Thévenot, Etienne A.
author_facet Rinaudo, Philippe
Boudah, Samia
Junot, Christophe
Thévenot, Etienne A.
author_sort Rinaudo, Philippe
collection PubMed
description High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within the pipeline leading to clinical tests. Several statistical and data mining methods have been described for feature selection: in particular, wrapper approaches iteratively assess the performance of the classifier on distinct subsets of variables. Current wrappers, however, do not estimate the significance of the selected features. We therefore developed a new methodology to find the smallest feature subset which significantly contributes to the model performance, by using a combination of resampling, ranking of variable importance, significance assessment by permutation of the feature values in the test subsets, and half-interval search. We wrapped our biosigner algorithm around three reference binary classifiers (Partial Least Squares—Discriminant Analysis, Random Forest, and Support Vector Machines) which have been shown to achieve specific performances depending on the structure of the dataset. By using three real biological and clinical metabolomics and transcriptomics datasets (containing up to 7000 features), complementary signatures were obtained in a few minutes, generally providing higher prediction accuracies than the initial full model. Comparison with alternative feature selection approaches further indicated that our method provides signatures of restricted size and high stability. Finally, by using our methodology to seek metabolites discriminating type 1 from type 2 diabetic patients, several features were selected, including a fragment from the taurochenodeoxycholic bile acid. Our methodology, implemented in the biosigner R/Bioconductor package and Galaxy/Workflow4metabolomics module, should be of interest for both experimenters and statisticians to identify robust molecular signatures from large omics datasets in the process of developing new diagnostics.
format Online
Article
Text
id pubmed-4914951
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-49149512016-07-21 biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data Rinaudo, Philippe Boudah, Samia Junot, Christophe Thévenot, Etienne A. Front Mol Biosci Molecular Biosciences High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within the pipeline leading to clinical tests. Several statistical and data mining methods have been described for feature selection: in particular, wrapper approaches iteratively assess the performance of the classifier on distinct subsets of variables. Current wrappers, however, do not estimate the significance of the selected features. We therefore developed a new methodology to find the smallest feature subset which significantly contributes to the model performance, by using a combination of resampling, ranking of variable importance, significance assessment by permutation of the feature values in the test subsets, and half-interval search. We wrapped our biosigner algorithm around three reference binary classifiers (Partial Least Squares—Discriminant Analysis, Random Forest, and Support Vector Machines) which have been shown to achieve specific performances depending on the structure of the dataset. By using three real biological and clinical metabolomics and transcriptomics datasets (containing up to 7000 features), complementary signatures were obtained in a few minutes, generally providing higher prediction accuracies than the initial full model. Comparison with alternative feature selection approaches further indicated that our method provides signatures of restricted size and high stability. Finally, by using our methodology to seek metabolites discriminating type 1 from type 2 diabetic patients, several features were selected, including a fragment from the taurochenodeoxycholic bile acid. Our methodology, implemented in the biosigner R/Bioconductor package and Galaxy/Workflow4metabolomics module, should be of interest for both experimenters and statisticians to identify robust molecular signatures from large omics datasets in the process of developing new diagnostics. Frontiers Media S.A. 2016-06-21 /pmc/articles/PMC4914951/ /pubmed/27446929 http://dx.doi.org/10.3389/fmolb.2016.00026 Text en Copyright © 2016 Rinaudo, Boudah, Junot and Thévenot. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Rinaudo, Philippe
Boudah, Samia
Junot, Christophe
Thévenot, Etienne A.
biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
title biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
title_full biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
title_fullStr biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
title_full_unstemmed biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
title_short biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data
title_sort biosigner: a new method for the discovery of significant molecular signatures from omics data
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914951/
https://www.ncbi.nlm.nih.gov/pubmed/27446929
http://dx.doi.org/10.3389/fmolb.2016.00026
work_keys_str_mv AT rinaudophilippe biosigneranewmethodforthediscoveryofsignificantmolecularsignaturesfromomicsdata
AT boudahsamia biosigneranewmethodforthediscoveryofsignificantmolecularsignaturesfromomicsdata
AT junotchristophe biosigneranewmethodforthediscoveryofsignificantmolecularsignaturesfromomicsdata
AT thevenotetiennea biosigneranewmethodforthediscoveryofsignificantmolecularsignaturesfromomicsdata