Cargando…

Simple-Random-Sampling-Based Multiclass Text Classification Algorithm

Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web doc...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Wuying, Wang, Lin, Yi, Mianzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977423/
https://www.ncbi.nlm.nih.gov/pubmed/24778587
http://dx.doi.org/10.1155/2014/517498
_version_ 1782310415736242176
author Liu, Wuying
Wang, Lin
Yi, Mianzhu
author_facet Liu, Wuying
Wang, Lin
Yi, Mianzhu
author_sort Liu, Wuying
collection PubMed
description Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.
format Online
Article
Text
id pubmed-3977423
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-39774232014-04-28 Simple-Random-Sampling-Based Multiclass Text Classification Algorithm Liu, Wuying Wang, Lin Yi, Mianzhu ScientificWorldJournal Research Article Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements. Hindawi Publishing Corporation 2014-03-19 /pmc/articles/PMC3977423/ /pubmed/24778587 http://dx.doi.org/10.1155/2014/517498 Text en Copyright © 2014 Wuying Liu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Wuying
Wang, Lin
Yi, Mianzhu
Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_full Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_fullStr Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_full_unstemmed Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_short Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_sort simple-random-sampling-based multiclass text classification algorithm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977423/
https://www.ncbi.nlm.nih.gov/pubmed/24778587
http://dx.doi.org/10.1155/2014/517498
work_keys_str_mv AT liuwuying simplerandomsamplingbasedmulticlasstextclassificationalgorithm
AT wanglin simplerandomsamplingbasedmulticlasstextclassificationalgorithm
AT yimianzhu simplerandomsamplingbasedmulticlasstextclassificationalgorithm