Cargando…
CamurWeb: a classification software and a large knowledge base for gene expression data of cancer
BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning technique...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191971/ https://www.ncbi.nlm.nih.gov/pubmed/30367574 http://dx.doi.org/10.1186/s12859-018-2299-7 |
_version_ | 1783363819579375616 |
---|---|
author | Weitschek, Emanuel Lauro, Silvia Di Cappelli, Eleonora Bertolazzi, Paola Felici, Giovanni |
author_facet | Weitschek, Emanuel Lauro, Silvia Di Cappelli, Eleonora Bertolazzi, Paola Felici, Giovanni |
author_sort | Weitschek, Emanuel |
collection | PubMed |
description | BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer. RESULTS: We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas (“if then” rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb. CONCLUSIONS: The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research. |
format | Online Article Text |
id | pubmed-6191971 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61919712018-10-23 CamurWeb: a classification software and a large knowledge base for gene expression data of cancer Weitschek, Emanuel Lauro, Silvia Di Cappelli, Eleonora Bertolazzi, Paola Felici, Giovanni BMC Bioinformatics Research BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer. RESULTS: We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas (“if then” rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb. CONCLUSIONS: The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research. BioMed Central 2018-10-15 /pmc/articles/PMC6191971/ /pubmed/30367574 http://dx.doi.org/10.1186/s12859-018-2299-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Weitschek, Emanuel Lauro, Silvia Di Cappelli, Eleonora Bertolazzi, Paola Felici, Giovanni CamurWeb: a classification software and a large knowledge base for gene expression data of cancer |
title | CamurWeb: a classification software and a large knowledge base for gene expression data of cancer |
title_full | CamurWeb: a classification software and a large knowledge base for gene expression data of cancer |
title_fullStr | CamurWeb: a classification software and a large knowledge base for gene expression data of cancer |
title_full_unstemmed | CamurWeb: a classification software and a large knowledge base for gene expression data of cancer |
title_short | CamurWeb: a classification software and a large knowledge base for gene expression data of cancer |
title_sort | camurweb: a classification software and a large knowledge base for gene expression data of cancer |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191971/ https://www.ncbi.nlm.nih.gov/pubmed/30367574 http://dx.doi.org/10.1186/s12859-018-2299-7 |
work_keys_str_mv | AT weitschekemanuel camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer AT laurosilviadi camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer AT cappellieleonora camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer AT bertolazzipaola camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer AT felicigiovanni camurwebaclassificationsoftwareandalargeknowledgebaseforgeneexpressiondataofcancer |