Cargando…
Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to i...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404302/ https://www.ncbi.nlm.nih.gov/pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8 |
_version_ | 1783231571967344640 |
---|---|
author | Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. |
author_facet | Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. |
author_sort | Crabtree, Nathaniel M. |
collection | PubMed |
description | BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5404302 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54043022017-04-27 Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. BioData Min Methodology BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-24 /pmc/articles/PMC5404302/ /pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery |
title | Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery |
title_full | Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery |
title_fullStr | Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery |
title_full_unstemmed | Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery |
title_short | Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery |
title_sort | multi-class computational evolution: development, benchmark evaluation and application to rna-seq biomarker discovery |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404302/ https://www.ncbi.nlm.nih.gov/pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8 |
work_keys_str_mv | AT crabtreenathanielm multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery AT moorejasonh multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery AT bowyerjohnf multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery AT georgenysiai multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery |