Cargando…
MSclassifier: median-supplement model-based classification tool for automated knowledge discovery
High-throughput technologies have resulted in an exponential growth of publicly available and accessible datasets for biomedical research. Efficient computational models, algorithms and tools are required to exploit the datasets for knowledge discovery to aid medical decisions. Here, we introduce a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7788522/ https://www.ncbi.nlm.nih.gov/pubmed/33456763 http://dx.doi.org/10.12688/f1000research.25501.1 |
_version_ | 1783633046533046272 |
---|---|
author | Adabor, Emmanuel S. Acquaah-Mensah, George K. Mazandu, Gaston K. |
author_facet | Adabor, Emmanuel S. Acquaah-Mensah, George K. Mazandu, Gaston K. |
author_sort | Adabor, Emmanuel S. |
collection | PubMed |
description | High-throughput technologies have resulted in an exponential growth of publicly available and accessible datasets for biomedical research. Efficient computational models, algorithms and tools are required to exploit the datasets for knowledge discovery to aid medical decisions. Here, we introduce a new tool, MSclassifier, based on median-supplement approaches to machine learning to enable an automated and effective binary classification for optimal decision making. The MSclassifier package estimates medians of features (attributes) to deduce supplementary data, which is subsequently introduced into the training set for balancing and building superior models for classification. To test our approach, it is used to determine HER2 receptor expression status phenotypes in breast cancer and also predict protein subcellular localization (plasma membrane and nucleus). Using independent sample and cross-validation tests, the performance of MSclassifier is evaluated and compared with well established tools that could perform such tasks. In the HER2 receptor expression status phenotype identification tasks, MSclassifier achieved statistically significant higher classification rates than the best performing existing tool (90.30% versus 89.83%, p=8.62e-3). In the subcellular localization prediction tasks, MSclassifier and one other existing tool achieved equally high performances (93.42% versus 93.19%, p=0.06) although they both outperformed tools based on Naive Bayes classifiers. Overall, the application and evaluation of MSclassifier reveal its potential to be applied to varieties of binary classification problems. The MSclassifier package provides an R-portable and user-friendly application to a broad audience, enabling experienced end-users as well as non-programmers to perform an effective classification in biomedical and other fields of study. |
format | Online Article Text |
id | pubmed-7788522 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-77885222021-01-14 MSclassifier: median-supplement model-based classification tool for automated knowledge discovery Adabor, Emmanuel S. Acquaah-Mensah, George K. Mazandu, Gaston K. F1000Res Software Tool Article High-throughput technologies have resulted in an exponential growth of publicly available and accessible datasets for biomedical research. Efficient computational models, algorithms and tools are required to exploit the datasets for knowledge discovery to aid medical decisions. Here, we introduce a new tool, MSclassifier, based on median-supplement approaches to machine learning to enable an automated and effective binary classification for optimal decision making. The MSclassifier package estimates medians of features (attributes) to deduce supplementary data, which is subsequently introduced into the training set for balancing and building superior models for classification. To test our approach, it is used to determine HER2 receptor expression status phenotypes in breast cancer and also predict protein subcellular localization (plasma membrane and nucleus). Using independent sample and cross-validation tests, the performance of MSclassifier is evaluated and compared with well established tools that could perform such tasks. In the HER2 receptor expression status phenotype identification tasks, MSclassifier achieved statistically significant higher classification rates than the best performing existing tool (90.30% versus 89.83%, p=8.62e-3). In the subcellular localization prediction tasks, MSclassifier and one other existing tool achieved equally high performances (93.42% versus 93.19%, p=0.06) although they both outperformed tools based on Naive Bayes classifiers. Overall, the application and evaluation of MSclassifier reveal its potential to be applied to varieties of binary classification problems. The MSclassifier package provides an R-portable and user-friendly application to a broad audience, enabling experienced end-users as well as non-programmers to perform an effective classification in biomedical and other fields of study. F1000 Research Limited 2020-09-10 /pmc/articles/PMC7788522/ /pubmed/33456763 http://dx.doi.org/10.12688/f1000research.25501.1 Text en Copyright: © 2020 Adabor ES et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Tool Article Adabor, Emmanuel S. Acquaah-Mensah, George K. Mazandu, Gaston K. MSclassifier: median-supplement model-based classification tool for automated knowledge discovery |
title | MSclassifier: median-supplement model-based classification tool for automated knowledge discovery |
title_full | MSclassifier: median-supplement model-based classification tool for automated knowledge discovery |
title_fullStr | MSclassifier: median-supplement model-based classification tool for automated knowledge discovery |
title_full_unstemmed | MSclassifier: median-supplement model-based classification tool for automated knowledge discovery |
title_short | MSclassifier: median-supplement model-based classification tool for automated knowledge discovery |
title_sort | msclassifier: median-supplement model-based classification tool for automated knowledge discovery |
topic | Software Tool Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7788522/ https://www.ncbi.nlm.nih.gov/pubmed/33456763 http://dx.doi.org/10.12688/f1000research.25501.1 |
work_keys_str_mv | AT adaboremmanuels msclassifiermediansupplementmodelbasedclassificationtoolforautomatedknowledgediscovery AT acquaahmensahgeorgek msclassifiermediansupplementmodelbasedclassificationtoolforautomatedknowledgediscovery AT mazandugastonk msclassifiermediansupplementmodelbasedclassificationtoolforautomatedknowledgediscovery |