Cargando…

CamurWeb: a classification software and a large knowledge base for gene expression data of cancer

BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning technique...

Descripción completa

Detalles Bibliográficos
Autores principales: Weitschek, Emanuel, Lauro, Silvia Di, Cappelli, Eleonora, Bertolazzi, Paola, Felici, Giovanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191971/
https://www.ncbi.nlm.nih.gov/pubmed/30367574
http://dx.doi.org/10.1186/s12859-018-2299-7
_version_ 1783363819579375616
author Weitschek, Emanuel
Lauro, Silvia Di
Cappelli, Eleonora
Bertolazzi, Paola
Felici, Giovanni
author_facet Weitschek, Emanuel
Lauro, Silvia Di
Cappelli, Eleonora
Bertolazzi, Paola
Felici, Giovanni
author_sort Weitschek, Emanuel
collection PubMed
description BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer. RESULTS: We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas (“if then” rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb. CONCLUSIONS: The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research.
format Online
Article
Text
id pubmed-6191971
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61919712018-10-23 CamurWeb: a classification software and a large knowledge base for gene expression data of cancer Weitschek, Emanuel Lauro, Silvia Di Cappelli, Eleonora Bertolazzi, Paola Felici, Giovanni BMC Bioinformatics Research BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer. RESULTS: We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas (“if then” rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb. CONCLUSIONS: The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research. BioMed Central 2018-10-15 /pmc/articles/PMC6191971/ /pubmed/30367574 http://dx.doi.org/10.1186/s12859-018-2299-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Weitschek, Emanuel
Lauro, Silvia Di
Cappelli, Eleonora
Bertolazzi, Paola
Felici, Giovanni
CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
title CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
title_full CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
title_fullStr CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
title_full_unstemmed CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
title_short CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
title_sort camurweb: a classification software and a large knowledge base for gene expression data of cancer
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191971/
https://www.ncbi.nlm.nih.gov/pubmed/30367574
http://dx.doi.org/10.1186/s12859-018-2299-7
work_keys_str_mv AT weitschekemanuel camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer
AT laurosilviadi camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer
AT cappellieleonora camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer
AT bertolazzipaola camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer
AT felicigiovanni camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer