Cargando…

Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery

BACKGROUND: Classification is the problem of assigning each input object to one of a finite number of classes. This problem has been extensively studied in machine learning and statistics, and there are numerous applications to bioinformatics as well as many other fields. Building a multiclass class...

Descripción completa

Detalles Bibliográficos
Autores principales: Menor, Mark, Baek , Kyungim, Poisson, Guylaine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582431/
https://www.ncbi.nlm.nih.gov/pubmed/23445533
http://dx.doi.org/10.1186/1471-2164-14-S2-S6
_version_ 1782260560658694144
author Menor, Mark
Baek , Kyungim
Poisson, Guylaine
author_facet Menor, Mark
Baek , Kyungim
Poisson, Guylaine
author_sort Menor, Mark
collection PubMed
description BACKGROUND: Classification is the problem of assigning each input object to one of a finite number of classes. This problem has been extensively studied in machine learning and statistics, and there are numerous applications to bioinformatics as well as many other fields. Building a multiclass classifier has been a challenge, where the direct approach of altering the binary classification algorithm to accommodate more than two classes can be computationally too expensive. Hence the indirect approach of using binary decomposition has been commonly used, in which retrieving the class posterior probabilities from the set of binary posterior probabilities given by the individual binary classifiers has been a major issue. METHODS: In this work, we present an extension of a recently introduced probabilistic kernel-based learning algorithm called the Classification Relevance Units Machine (CRUM) to the multiclass setting to increase its applicability. The extension is achieved under the error correcting output codes framework. The probabilistic outputs of the binary CRUMs are preserved using a proposed linear-time decoding algorithm, an alternative to the generalized Bradley-Terry (GBT) algorithm whose application to large-scale prediction settings is prohibited by its computational complexity. The resulting classifier is called the Multiclass Relevance Units Machine (McRUM). RESULTS: The evaluation of McRUM on a variety of real small-scale benchmark datasets shows that our proposed Naïve decoding algorithm is computationally more efficient than the GBT algorithm while maintaining a similar level of predictive accuracy. Then a set of experiments on a larger scale dataset for small ncRNA classification have been conducted with Naïve McRUM and compared with the Gaussian and linear SVM. Although McRUM's predictive performance is slightly lower than the Gaussian SVM, the results show that the similar level of true positive rate can be achieved by sacrificing false positive rate slightly. Furthermore, McRUM is computationally more efficient than the SVM, which is an important factor for large-scale analysis. CONCLUSIONS: We have proposed McRUM, a multiclass extension of binary CRUM. McRUM with Naïve decoding algorithm is computationally efficient in run-time and its predictive performance is comparable to the well-known SVM, showing its potential in solving large-scale multiclass problems in bioinformatics and other fields of study.
format Online
Article
Text
id pubmed-3582431
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35824312013-03-05 Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery Menor, Mark Baek , Kyungim Poisson, Guylaine BMC Genomics Research BACKGROUND: Classification is the problem of assigning each input object to one of a finite number of classes. This problem has been extensively studied in machine learning and statistics, and there are numerous applications to bioinformatics as well as many other fields. Building a multiclass classifier has been a challenge, where the direct approach of altering the binary classification algorithm to accommodate more than two classes can be computationally too expensive. Hence the indirect approach of using binary decomposition has been commonly used, in which retrieving the class posterior probabilities from the set of binary posterior probabilities given by the individual binary classifiers has been a major issue. METHODS: In this work, we present an extension of a recently introduced probabilistic kernel-based learning algorithm called the Classification Relevance Units Machine (CRUM) to the multiclass setting to increase its applicability. The extension is achieved under the error correcting output codes framework. The probabilistic outputs of the binary CRUMs are preserved using a proposed linear-time decoding algorithm, an alternative to the generalized Bradley-Terry (GBT) algorithm whose application to large-scale prediction settings is prohibited by its computational complexity. The resulting classifier is called the Multiclass Relevance Units Machine (McRUM). RESULTS: The evaluation of McRUM on a variety of real small-scale benchmark datasets shows that our proposed Naïve decoding algorithm is computationally more efficient than the GBT algorithm while maintaining a similar level of predictive accuracy. Then a set of experiments on a larger scale dataset for small ncRNA classification have been conducted with Naïve McRUM and compared with the Gaussian and linear SVM. Although McRUM's predictive performance is slightly lower than the Gaussian SVM, the results show that the similar level of true positive rate can be achieved by sacrificing false positive rate slightly. Furthermore, McRUM is computationally more efficient than the SVM, which is an important factor for large-scale analysis. CONCLUSIONS: We have proposed McRUM, a multiclass extension of binary CRUM. McRUM with Naïve decoding algorithm is computationally efficient in run-time and its predictive performance is comparable to the well-known SVM, showing its potential in solving large-scale multiclass problems in bioinformatics and other fields of study. BioMed Central 2013-02-15 /pmc/articles/PMC3582431/ /pubmed/23445533 http://dx.doi.org/10.1186/1471-2164-14-S2-S6 Text en Copyright ©2013 Menor et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Menor, Mark
Baek , Kyungim
Poisson, Guylaine
Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery
title Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery
title_full Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery
title_fullStr Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery
title_full_unstemmed Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery
title_short Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery
title_sort multiclass relevance units machine: benchmark evaluation and application to small ncrna discovery
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582431/
https://www.ncbi.nlm.nih.gov/pubmed/23445533
http://dx.doi.org/10.1186/1471-2164-14-S2-S6
work_keys_str_mv AT menormark multiclassrelevanceunitsmachinebenchmarkevaluationandapplicationtosmallncrnadiscovery
AT baekkyungim multiclassrelevanceunitsmachinebenchmarkevaluationandapplicationtosmallncrnadiscovery
AT poissonguylaine multiclassrelevanceunitsmachinebenchmarkevaluationandapplicationtosmallncrnadiscovery