Cargando…

Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery

BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to i...

Descripción completa

Detalles Bibliográficos
Autores principales: Crabtree, Nathaniel M., Moore, Jason H., Bowyer, John F., George, Nysia I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404302/
https://www.ncbi.nlm.nih.gov/pubmed/28450890
http://dx.doi.org/10.1186/s13040-017-0134-8
_version_ 1783231571967344640
author Crabtree, Nathaniel M.
Moore, Jason H.
Bowyer, John F.
George, Nysia I.
author_facet Crabtree, Nathaniel M.
Moore, Jason H.
Bowyer, John F.
George, Nysia I.
author_sort Crabtree, Nathaniel M.
collection PubMed
description BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5404302
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54043022017-04-27 Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. BioData Min Methodology BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-24 /pmc/articles/PMC5404302/ /pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Crabtree, Nathaniel M.
Moore, Jason H.
Bowyer, John F.
George, Nysia I.
Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_full Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_fullStr Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_full_unstemmed Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_short Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_sort multi-class computational evolution: development, benchmark evaluation and application to rna-seq biomarker discovery
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404302/
https://www.ncbi.nlm.nih.gov/pubmed/28450890
http://dx.doi.org/10.1186/s13040-017-0134-8
work_keys_str_mv AT crabtreenathanielm multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery
AT moorejasonh multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery
AT bowyerjohnf multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery
AT georgenysiai multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery