Cargando…

Multiple-input multiple-output causal strategies for gene selection

BACKGROUND: Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The l...

Descripción completa

Detalles Bibliográficos
Autores principales: Bontempi, Gianluca, Haibe-Kains, Benjamin, Desmedt, Christine, Sotiriou, Christos, Quackenbush, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323860/
https://www.ncbi.nlm.nih.gov/pubmed/22118187
http://dx.doi.org/10.1186/1471-2105-12-458
_version_ 1782229266028560384
author Bontempi, Gianluca
Haibe-Kains, Benjamin
Desmedt, Christine
Sotiriou, Christos
Quackenbush, John
author_facet Bontempi, Gianluca
Haibe-Kains, Benjamin
Desmedt, Christine
Sotiriou, Christos
Quackenbush, John
author_sort Bontempi, Gianluca
collection PubMed
description BACKGROUND: Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting. RESULTS: We show in synthetic case study that a better prioritization of causal variables can be obtained by considering a relevance score which incorporates a causal term. In addition we show, in a meta-analysis study of six publicly available breast cancer microarray datasets, that the improvement occurs also in terms of accuracy. The biological interpretation of the results confirms the potential of a causal approach to gene selection. CONCLUSIONS: Integrating causal information into gene selection algorithms is effective both in terms of prediction accuracy and biological interpretation.
format Online
Article
Text
id pubmed-3323860
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33238602012-04-16 Multiple-input multiple-output causal strategies for gene selection Bontempi, Gianluca Haibe-Kains, Benjamin Desmedt, Christine Sotiriou, Christos Quackenbush, John BMC Bioinformatics Research Article BACKGROUND: Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting. RESULTS: We show in synthetic case study that a better prioritization of causal variables can be obtained by considering a relevance score which incorporates a causal term. In addition we show, in a meta-analysis study of six publicly available breast cancer microarray datasets, that the improvement occurs also in terms of accuracy. The biological interpretation of the results confirms the potential of a causal approach to gene selection. CONCLUSIONS: Integrating causal information into gene selection algorithms is effective both in terms of prediction accuracy and biological interpretation. BioMed Central 2011-11-25 /pmc/articles/PMC3323860/ /pubmed/22118187 http://dx.doi.org/10.1186/1471-2105-12-458 Text en Copyright ©2011 Bontempi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bontempi, Gianluca
Haibe-Kains, Benjamin
Desmedt, Christine
Sotiriou, Christos
Quackenbush, John
Multiple-input multiple-output causal strategies for gene selection
title Multiple-input multiple-output causal strategies for gene selection
title_full Multiple-input multiple-output causal strategies for gene selection
title_fullStr Multiple-input multiple-output causal strategies for gene selection
title_full_unstemmed Multiple-input multiple-output causal strategies for gene selection
title_short Multiple-input multiple-output causal strategies for gene selection
title_sort multiple-input multiple-output causal strategies for gene selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323860/
https://www.ncbi.nlm.nih.gov/pubmed/22118187
http://dx.doi.org/10.1186/1471-2105-12-458
work_keys_str_mv AT bontempigianluca multipleinputmultipleoutputcausalstrategiesforgeneselection
AT haibekainsbenjamin multipleinputmultipleoutputcausalstrategiesforgeneselection
AT desmedtchristine multipleinputmultipleoutputcausalstrategiesforgeneselection
AT sotiriouchristos multipleinputmultipleoutputcausalstrategiesforgeneselection
AT quackenbushjohn multipleinputmultipleoutputcausalstrategiesforgeneselection