Cargando…
Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution
BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, fo...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353680/ https://www.ncbi.nlm.nih.gov/pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y |
_version_ | 1782515167285739520 |
---|---|
author | Rodriguez, Nestor Rojas–Galeano, Sergio |
author_facet | Rodriguez, Nestor Rojas–Galeano, Sergio |
author_sort | Rodriguez, Nestor |
collection | PubMed |
description | BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so–called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. RESULTS: We propose a new wrapper method that builds upon the discrimination power of a weighted kernel classifier to guide the search for a probabilistic model of simultaneous marginal and interacting effects. The feasibility of the method was evaluated in three empirical studies. The first one assessed its ability to discover complex epistatic effects on a large–scale testbed of generated human genetic problems; the method succeeded in 4 out of 5 of these problems while providing more accurate and expressive results than a baseline technique that also considers dependencies. The second study evaluated the performance of the method in benchmark classification tasks; in average the prediction accuracy was comparable to two other baseline techniques whilst finding smaller subsets of relevant features. The last study was aimed at discovering relevancy/dependency in a hepatitis dataset; in this regard, evidence recently reported in medical literature corroborated our findings. As a byproduct, the method was implemented and made freely available as a toolbox of software components deployed within an existing visual data–mining workbench. CONCLUSIONS: The mining advantages exhibited by the method come at the expense of a higher computational complexity, posing interesting algorithmic challenges regarding its applicability to large–scale datasets. Extending the probabilistic assumptions of the method to continuous distributions and higher–degree interactions is also appealing. As a final remark, we advocate broadening the use of visual graphical software tools as they enable biodata researchers to focus on experiment design, visualisation and data analysis rather than on refining their scripting programming skills. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0131-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5353680 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53536802017-03-22 Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution Rodriguez, Nestor Rojas–Galeano, Sergio BioData Min Methodology BACKGROUND: Discovering relevant features (biomarkers) that discriminate etiologies of a disease is useful to provide biomedical researchers with candidate targets for further laboratory experimentation while saving costs; dependencies among biomarkers may suggest additional valuable information, for example, to characterize complex epistatic relationships from genetic data. The use of classifiers to guide the search for biomarkers (the so–called wrapper approach) has been widely studied. However, simultaneously searching for relevancy and dependencies among markers is a less explored ground. RESULTS: We propose a new wrapper method that builds upon the discrimination power of a weighted kernel classifier to guide the search for a probabilistic model of simultaneous marginal and interacting effects. The feasibility of the method was evaluated in three empirical studies. The first one assessed its ability to discover complex epistatic effects on a large–scale testbed of generated human genetic problems; the method succeeded in 4 out of 5 of these problems while providing more accurate and expressive results than a baseline technique that also considers dependencies. The second study evaluated the performance of the method in benchmark classification tasks; in average the prediction accuracy was comparable to two other baseline techniques whilst finding smaller subsets of relevant features. The last study was aimed at discovering relevancy/dependency in a hepatitis dataset; in this regard, evidence recently reported in medical literature corroborated our findings. As a byproduct, the method was implemented and made freely available as a toolbox of software components deployed within an existing visual data–mining workbench. CONCLUSIONS: The mining advantages exhibited by the method come at the expense of a higher computational complexity, posing interesting algorithmic challenges regarding its applicability to large–scale datasets. Extending the probabilistic assumptions of the method to continuous distributions and higher–degree interactions is also appealing. As a final remark, we advocate broadening the use of visual graphical software tools as they enable biodata researchers to focus on experiment design, visualisation and data analysis rather than on refining their scripting programming skills. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0131-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-15 /pmc/articles/PMC5353680/ /pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Rodriguez, Nestor Rojas–Galeano, Sergio Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
title | Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
title_full | Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
title_fullStr | Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
title_full_unstemmed | Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
title_short | Discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
title_sort | discovering feature relevancy and dependency by kernel-guided probabilistic model-building evolution |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5353680/ https://www.ncbi.nlm.nih.gov/pubmed/28331548 http://dx.doi.org/10.1186/s13040-017-0131-y |
work_keys_str_mv | AT rodrigueznestor discoveringfeaturerelevancyanddependencybykernelguidedprobabilisticmodelbuildingevolution AT rojasgaleanosergio discoveringfeaturerelevancyanddependencybykernelguidedprobabilisticmodelbuildingevolution |