Cargando…

Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution

BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodriguez, Nestor, Rojas–Galeano, Sergio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353680/
https://www.ncbi.nlm.nih.gov/pubmed/28331548
http://dx.doi.org/10.1186/s13040-017-0131-y
_version_ 1782515167285739520
author Rodriguez, Nestor
Rojas–Galeano, Sergio
author_facet Rodriguez, Nestor
Rojas–Galeano, Sergio
author_sort Rodriguez, Nestor
collection PubMed
description BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so–called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. RESULTS: We propose a new wrapper method that builds upon the discrimination power of a weighted kernel classifier to guide the search for a probabilistic model of simultaneous marginal and interacting effects. The feasibility of the method was evaluated in three empirical studies. The first one assessed its ability to discover complex epistatic effects on a large–scale testbed of generated human genetic problems; the method succeeded in 4 out of 5 of these problems while providing more accurate and expressive results than a baseline technique that also considers dependencies. The second study evaluated the performance of the method in benchmark classification tasks; in average the prediction accuracy was comparable to two other baseline techniques whilst finding smaller subsets of relevant features. The last study was aimed at discovering relevancy/dependency in a hepatitis dataset; in this regard, evidence recently reported in medical literature corroborated our findings. As a byproduct, the method was implemented and made freely available as a toolbox of software components deployed within an existing visual data–mining workbench. CONCLUSIONS: The mining advantages exhibited by the method come at the expense of a higher computational complexity, posing interesting algorithmic challenges regarding its applicability to large–scale datasets. Extending the probabilistic assumptions of the method to continuous distributions and higher–degree interactions is also appealing. As a final remark, we advocate broadening the use of visual graphical software tools as they enable biodata researchers to focus on experiment design, visualisation and data analysis rather than on refining their scripting programming skills. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0131-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5353680
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53536802017-03-22 Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution Rodriguez, Nestor Rojas–Galeano, Sergio BioData Min Methodology BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so–called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. RESULTS: We propose a new wrapper method that builds upon the discrimination power of a weighted kernel classifier to guide the search for a probabilistic model of simultaneous marginal and interacting effects. The feasibility of the method was evaluated in three empirical studies. The first one assessed its ability to discover complex epistatic effects on a large–scale testbed of generated human genetic problems; the method succeeded in 4 out of 5 of these problems while providing more accurate and expressive results than a baseline technique that also considers dependencies. The second study evaluated the performance of the method in benchmark classification tasks; in average the prediction accuracy was comparable to two other baseline techniques whilst finding smaller subsets of relevant features. The last study was aimed at discovering relevancy/dependency in a hepatitis dataset; in this regard, evidence recently reported in medical literature corroborated our findings. As a byproduct, the method was implemented and made freely available as a toolbox of software components deployed within an existing visual data–mining workbench. CONCLUSIONS: The mining advantages exhibited by the method come at the expense of a higher computational complexity, posing interesting algorithmic challenges regarding its applicability to large–scale datasets. Extending the probabilistic assumptions of the method to continuous distributions and higher–degree interactions is also appealing. As a final remark, we advocate broadening the use of visual graphical software tools as they enable biodata researchers to focus on experiment design, visualisation and data analysis rather than on refining their scripting programming skills. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0131-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-15 /pmc/articles/PMC5353680/ /pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Rodriguez, Nestor
Rojas–Galeano, Sergio
Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_full Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_fullStr Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_full_unstemmed Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_short Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_sort discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353680/
https://www.ncbi.nlm.nih.gov/pubmed/28331548
http://dx.doi.org/10.1186/s13040-017-0131-y
work_keys_str_mv AT rodrigueznestor discoveringfeaturerelevancyanddependencybykernelguidedprobabilisticmodelbuildingevolution
AT rojasgaleanosergio discoveringfeaturerelevancyanddependencybykernelguidedprobabilisticmodelbuildingevolution