Cargando…
Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web doc...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977423/ https://www.ncbi.nlm.nih.gov/pubmed/24778587 http://dx.doi.org/10.1155/2014/517498 |
_version_ | 1782310415736242176 |
---|---|
author | Liu, Wuying Wang, Lin Yi, Mianzhu |
author_facet | Liu, Wuying Wang, Lin Yi, Mianzhu |
author_sort | Liu, Wuying |
collection | PubMed |
description | Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements. |
format | Online Article Text |
id | pubmed-3977423 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-39774232014-04-28 Simple-Random-Sampling-Based Multiclass Text Classification Algorithm Liu, Wuying Wang, Lin Yi, Mianzhu ScientificWorldJournal Research Article Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements. Hindawi Publishing Corporation 2014-03-19 /pmc/articles/PMC3977423/ /pubmed/24778587 http://dx.doi.org/10.1155/2014/517498 Text en Copyright © 2014 Wuying Liu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Liu, Wuying Wang, Lin Yi, Mianzhu Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_full | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_fullStr | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_full_unstemmed | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_short | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_sort | simple-random-sampling-based multiclass text classification algorithm |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977423/ https://www.ncbi.nlm.nih.gov/pubmed/24778587 http://dx.doi.org/10.1155/2014/517498 |
work_keys_str_mv | AT liuwuying simplerandomsamplingbasedmulticlasstextclassificationalgorithm AT wanglin simplerandomsamplingbasedmulticlasstextclassificationalgorithm AT yimianzhu simplerandomsamplingbasedmulticlasstextclassificationalgorithm |