Cargando…

An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data

BACKGROUND: Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leung, Yuk Yee, Chang, Chun Qi, Hung, Yeung Sam
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3474777/ https://www.ncbi.nlm.nih.gov/pubmed/23082127 http://dx.doi.org/10.1371/journal.pone.0046700

_version_	1782246836405272576
author	Leung, Yuk Yee Chang, Chun Qi Hung, Yeung Sam
author_facet	Leung, Yuk Yee Chang, Chun Qi Hung, Yeung Sam
author_sort	Leung, Yuk Yee
collection	PubMed
description	BACKGROUND: Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. RESULTS: We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.
format	Online Article Text
id	pubmed-3474777
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-34747772012-10-18 An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data Leung, Yuk Yee Chang, Chun Qi Hung, Yeung Sam PLoS One Research Article BACKGROUND: Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. RESULTS: We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets. Public Library of Science 2012-10-17 /pmc/articles/PMC3474777/ /pubmed/23082127 http://dx.doi.org/10.1371/journal.pone.0046700 Text en © 2012 Leung et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Leung, Yuk Yee Chang, Chun Qi Hung, Yeung Sam An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data
title	An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data
title_full	An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data
title_fullStr	An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data
title_full_unstemmed	An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data
title_short	An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data
title_sort	integrated approach for identifying wrongly labelled samples when performing classification in microarray data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3474777/ https://www.ncbi.nlm.nih.gov/pubmed/23082127 http://dx.doi.org/10.1371/journal.pone.0046700
work_keys_str_mv	AT leungyukyee anintegratedapproachforidentifyingwronglylabelledsampleswhenperformingclassificationinmicroarraydata AT changchunqi anintegratedapproachforidentifyingwronglylabelledsampleswhenperformingclassificationinmicroarraydata AT hungyeungsam anintegratedapproachforidentifyingwronglylabelledsampleswhenperformingclassificationinmicroarraydata AT leungyukyee integratedapproachforidentifyingwronglylabelledsampleswhenperformingclassificationinmicroarraydata AT changchunqi integratedapproachforidentifyingwronglylabelledsampleswhenperformingclassificationinmicroarraydata AT hungyeungsam integratedapproachforidentifyingwronglylabelledsampleswhenperformingclassificationinmicroarraydata

An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data

Ejemplares similares