Cargando…

On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data

Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely lab...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kolesov, Anton, Kamyshenkov, Dmitry, Litovchenko, Maria, Smekalova, Elena, Golovizin, Alexey, Zhavoronkov, Alex
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3920912/ https://www.ncbi.nlm.nih.gov/pubmed/24587817 http://dx.doi.org/10.1155/2014/781807

_version_	1782303240412463104
author	Kolesov, Anton Kamyshenkov, Dmitry Litovchenko, Maria Smekalova, Elena Golovizin, Alexey Zhavoronkov, Alex
author_facet	Kolesov, Anton Kamyshenkov, Dmitry Litovchenko, Maria Smekalova, Elena Golovizin, Alexey Zhavoronkov, Alex
author_sort	Kolesov, Anton
collection	PubMed
description	Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor (WkNN) and soft supervised learning (SoftSL). Both of these approaches are based on similarity measurements between data vectors. We performed the experiments on AgingPortfolio (text dataset) and then rechecked on the Yeast (nontext genetic data). We tried SVM and RF classifiers for the original datasets and then for the modified ones. For each dataset, our experiments demonstrated that both classification algorithms performed considerably better when preceded by the training set modification step.
format	Online Article Text
id	pubmed-3920912
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-39209122014-03-02 On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data Kolesov, Anton Kamyshenkov, Dmitry Litovchenko, Maria Smekalova, Elena Golovizin, Alexey Zhavoronkov, Alex Comput Math Methods Med Research Article Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor (WkNN) and soft supervised learning (SoftSL). Both of these approaches are based on similarity measurements between data vectors. We performed the experiments on AgingPortfolio (text dataset) and then rechecked on the Yeast (nontext genetic data). We tried SVM and RF classifiers for the original datasets and then for the modified ones. For each dataset, our experiments demonstrated that both classification algorithms performed considerably better when preceded by the training set modification step. Hindawi Publishing Corporation 2014 2014-01-23 /pmc/articles/PMC3920912/ /pubmed/24587817 http://dx.doi.org/10.1155/2014/781807 Text en Copyright © 2014 Anton Kolesov et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Kolesov, Anton Kamyshenkov, Dmitry Litovchenko, Maria Smekalova, Elena Golovizin, Alexey Zhavoronkov, Alex On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data
title	On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data
title_full	On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data
title_fullStr	On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data
title_full_unstemmed	On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data
title_short	On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data
title_sort	on multilabel classification methods of incompletely labeled biomedical text data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3920912/ https://www.ncbi.nlm.nih.gov/pubmed/24587817 http://dx.doi.org/10.1155/2014/781807
work_keys_str_mv	AT kolesovanton onmultilabelclassificationmethodsofincompletelylabeledbiomedicaltextdata AT kamyshenkovdmitry onmultilabelclassificationmethodsofincompletelylabeledbiomedicaltextdata AT litovchenkomaria onmultilabelclassificationmethodsofincompletelylabeledbiomedicaltextdata AT smekalovaelena onmultilabelclassificationmethodsofincompletelylabeledbiomedicaltextdata AT golovizinalexey onmultilabelclassificationmethodsofincompletelylabeledbiomedicaltextdata AT zhavoronkovalex onmultilabelclassificationmethodsofincompletelylabeledbiomedicaltextdata

On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data

Ejemplares similares