Cargando…

Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery

BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Crabtree, Nathaniel M., Moore, Jason H., Bowyer, John F., George, Nysia I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404302/ https://www.ncbi.nlm.nih.gov/pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8

_version_	1783231571967344640
author	Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I.
author_facet	Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I.
author_sort	Crabtree, Nathaniel M.
collection	PubMed
description	BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5404302
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54043022017-04-27 Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. BioData Min Methodology BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-24 /pmc/articles/PMC5404302/ /pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Crabtree, Nathaniel M. Moore, Jason H. Bowyer, John F. George, Nysia I. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title	Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_full	Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_fullStr	Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_full_unstemmed	Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_short	Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery
title_sort	multi-class computational evolution: development, benchmark evaluation and application to rna-seq biomarker discovery
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404302/ https://www.ncbi.nlm.nih.gov/pubmed/28450890 http://dx.doi.org/10.1186/s13040-017-0134-8
work_keys_str_mv	AT crabtreenathanielm multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery AT moorejasonh multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery AT bowyerjohnf multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery AT georgenysiai multiclasscomputationalevolutiondevelopmentbenchmarkevaluationandapplicationtornaseqbiomarkerdiscovery

Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery

Ejemplares similares