Cargando…

Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution

BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, fo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rodriguez, Nestor, Rojas–Galeano, Sergio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353680/ https://www.ncbi.nlm.nih.gov/pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y

_version_	1782515167285739520
author	Rodriguez, Nestor Rojas–Galeano, Sergio
author_facet	Rodriguez, Nestor Rojas–Galeano, Sergio
author_sort	Rodriguez, Nestor
collection	PubMed
description	BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so–called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. RESULTS: We propose a new wrapper method that builds upon the discrimination power of a weighted kernel classifier to guide the search for a probabilistic model of simultaneous marginal and interacting effects. The feasibility of the method was evaluated in three empirical studies. The first one assessed its ability to discover complex epistatic effects on a large–scale testbed of generated human genetic problems; the method succeeded in 4 out of 5 of these problems while providing more accurate and expressive results than a baseline technique that also considers dependencies. The second study evaluated the performance of the method in benchmark classification tasks; in average the prediction accuracy was comparable to two other baseline techniques whilst finding smaller subsets of relevant features. The last study was aimed at discovering relevancy/dependency in a hepatitis dataset; in this regard, evidence recently reported in medical literature corroborated our findings. As a byproduct, the method was implemented and made freely available as a toolbox of software components deployed within an existing visual data–mining workbench. CONCLUSIONS: The mining advantages exhibited by the method come at the expense of a higher computational complexity, posing interesting algorithmic challenges regarding its applicability to large–scale datasets. Extending the probabilistic assumptions of the method to continuous distributions and higher–degree interactions is also appealing. As a final remark, we advocate broadening the use of visual graphical software tools as they enable biodata researchers to focus on experiment design, visualisation and data analysis rather than on refining their scripting programming skills. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0131-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5353680
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-53536802017-03-22 Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution Rodriguez, Nestor Rojas–Galeano, Sergio BioData Min Methodology BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so–called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. RESULTS: We propose a new wrapper method that builds upon the discrimination power of a weighted kernel classifier to guide the search for a probabilistic model of simultaneous marginal and interacting effects. The feasibility of the method was evaluated in three empirical studies. The first one assessed its ability to discover complex epistatic effects on a large–scale testbed of generated human genetic problems; the method succeeded in 4 out of 5 of these problems while providing more accurate and expressive results than a baseline technique that also considers dependencies. The second study evaluated the performance of the method in benchmark classification tasks; in average the prediction accuracy was comparable to two other baseline techniques whilst finding smaller subsets of relevant features. The last study was aimed at discovering relevancy/dependency in a hepatitis dataset; in this regard, evidence recently reported in medical literature corroborated our findings. As a byproduct, the method was implemented and made freely available as a toolbox of software components deployed within an existing visual data–mining workbench. CONCLUSIONS: The mining advantages exhibited by the method come at the expense of a higher computational complexity, posing interesting algorithmic challenges regarding its applicability to large–scale datasets. Extending the probabilistic assumptions of the method to continuous distributions and higher–degree interactions is also appealing. As a final remark, we advocate broadening the use of visual graphical software tools as they enable biodata researchers to focus on experiment design, visualisation and data analysis rather than on refining their scripting programming skills. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0131-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-15 /pmc/articles/PMC5353680/ /pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Rodriguez, Nestor Rojas–Galeano, Sergio Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title	Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_full	Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_fullStr	Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_full_unstemmed	Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_short	Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
title_sort	discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353680/ https://www.ncbi.nlm.nih.gov/pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y
work_keys_str_mv	AT rodrigueznestor discoveringfeaturerelevancyanddependencybykernelguidedprobabilisticmodelbuildingevolution AT rojasgaleanosergio discoveringfeaturerelevancyanddependencybykernelguidedprobabilisticmodelbuildingevolution

Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution

Ejemplares similares