Cargando…

Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets

BACKGROUND: High-throughput methods in molecular biology provided researchers with abundance of experimental data that need to be interpreted in order to understand the experimental results. Manual methods of functional gene/protein group interpretation are expensive and time-consuming; therefore, t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gruca, Aleksandra, Sikora, Marek
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483958/
https://www.ncbi.nlm.nih.gov/pubmed/28651634
http://dx.doi.org/10.1186/s13326-017-0129-x
_version_ 1783245805175439360
author Gruca, Aleksandra
Sikora, Marek
author_facet Gruca, Aleksandra
Sikora, Marek
author_sort Gruca, Aleksandra
collection PubMed
description BACKGROUND: High-throughput methods in molecular biology provided researchers with abundance of experimental data that need to be interpreted in order to understand the experimental results. Manual methods of functional gene/protein group interpretation are expensive and time-consuming; therefore, there is a need to develop new efficient data mining methods and bioinformatics tools that could support the expert in the process of functional analysis of experimental results. RESULTS: In this study, we propose a comprehensive framework for the induction of logical rules in the form of combinations of Gene Ontology (GO) terms for functional interpretation of gene sets. Within the framework, we present four approaches: the fully automated method of rule induction without filtering, rule induction method with filtering, expert-driven rule filtering method based on additive utility functions, and expert-driven rule induction method based on the so-called seed or expert terms – the GO terms of special interest which should be included into the description. These GO terms usually describe some processes or pathways of particular interest, which are related to the experiment that is being performed. During the rule induction and filtering processes such seed terms are used as a base on which the description is build. CONCLUSION: We compare the descriptions obtained with different algorithms of rule induction and filtering and show that a filtering step is required to reduce the number of rules in the output set so that they could be analyzed by a human expert. However, filtering may remove information from the output rule set which is potentially interesting for the expert. Therefore, in the study, we present two methods that involve interaction with the expert during the process of rule induction. Both of them are able to reduce the number of rules, but only in the case of the method based on seed terms, each of the created rule includes expert terms in combination with the other terms. Further analysis of such combinations may provide new knowledge about biological processes and their combination with other pathways related to genes described by the rules. A suite of Matlab scripts that provide the functionality of a comprehensive framework for the rule induction and filtering presented in this study is available free of charge at: http://rulego.polsl.pl/framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0129-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5483958
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54839582017-06-26 Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets Gruca, Aleksandra Sikora, Marek J Biomed Semantics Research BACKGROUND: High-throughput methods in molecular biology provided researchers with abundance of experimental data that need to be interpreted in order to understand the experimental results. Manual methods of functional gene/protein group interpretation are expensive and time-consuming; therefore, there is a need to develop new efficient data mining methods and bioinformatics tools that could support the expert in the process of functional analysis of experimental results. RESULTS: In this study, we propose a comprehensive framework for the induction of logical rules in the form of combinations of Gene Ontology (GO) terms for functional interpretation of gene sets. Within the framework, we present four approaches: the fully automated method of rule induction without filtering, rule induction method with filtering, expert-driven rule filtering method based on additive utility functions, and expert-driven rule induction method based on the so-called seed or expert terms – the GO terms of special interest which should be included into the description. These GO terms usually describe some processes or pathways of particular interest, which are related to the experiment that is being performed. During the rule induction and filtering processes such seed terms are used as a base on which the description is build. CONCLUSION: We compare the descriptions obtained with different algorithms of rule induction and filtering and show that a filtering step is required to reduce the number of rules in the output set so that they could be analyzed by a human expert. However, filtering may remove information from the output rule set which is potentially interesting for the expert. Therefore, in the study, we present two methods that involve interaction with the expert during the process of rule induction. Both of them are able to reduce the number of rules, but only in the case of the method based on seed terms, each of the created rule includes expert terms in combination with the other terms. Further analysis of such combinations may provide new knowledge about biological processes and their combination with other pathways related to genes described by the rules. A suite of Matlab scripts that provide the functionality of a comprehensive framework for the rule induction and filtering presented in this study is available free of charge at: http://rulego.polsl.pl/framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0129-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-26 /pmc/articles/PMC5483958/ /pubmed/28651634 http://dx.doi.org/10.1186/s13326-017-0129-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Gruca, Aleksandra
Sikora, Marek
Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
title Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
title_full Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
title_fullStr Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
title_full_unstemmed Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
title_short Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
title_sort data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483958/
https://www.ncbi.nlm.nih.gov/pubmed/28651634
http://dx.doi.org/10.1186/s13326-017-0129-x
work_keys_str_mv AT grucaaleksandra dataandexpertdrivenruleinductionandfilteringframeworkforfunctionalinterpretationanddescriptionofgenesets
AT sikoramarek dataandexpertdrivenruleinductionandfilteringframeworkforfunctionalinterpretationanddescriptionofgenesets