Cargando…

Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance

Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was af...

Descripción completa

Detalles Bibliográficos
Autores principales: Novianti, Putri W., Roes, Kit C. B., Eijkemans, Marinus J. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000205/
https://www.ncbi.nlm.nih.gov/pubmed/24770439
http://dx.doi.org/10.1371/journal.pone.0096063
_version_ 1782313596246556672
author Novianti, Putri W.
Roes, Kit C. B.
Eijkemans, Marinus J. C.
author_facet Novianti, Putri W.
Roes, Kit C. B.
Eijkemans, Marinus J. C.
author_sort Novianti, Putri W.
collection PubMed
description Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was affected by factors such as the classification algorithm, cross validation method, number of genes, and gene selection method. In this paper, we study the hypothesis that the disease under study significantly determines which method is optimal, and that additionally sample size, class imbalance, type of medical question (diagnostic, prognostic or treatment response), and microarray platform are potentially influential. A systematic literature review was used to extract the information from 48 published articles on non-cancer microarray classification studies. The impact of the various factors on the reported classification accuracy was analyzed through random-intercept logistic regression. The type of medical question and method of cross validation dominated the explained variation in accuracy among studies, followed by disease category and microarray platform. In total, 42% of the between study variation was explained by all the study specific and problem specific factors that we studied together.
format Online
Article
Text
id pubmed-4000205
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40002052014-04-29 Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance Novianti, Putri W. Roes, Kit C. B. Eijkemans, Marinus J. C. PLoS One Research Article Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was affected by factors such as the classification algorithm, cross validation method, number of genes, and gene selection method. In this paper, we study the hypothesis that the disease under study significantly determines which method is optimal, and that additionally sample size, class imbalance, type of medical question (diagnostic, prognostic or treatment response), and microarray platform are potentially influential. A systematic literature review was used to extract the information from 48 published articles on non-cancer microarray classification studies. The impact of the various factors on the reported classification accuracy was analyzed through random-intercept logistic regression. The type of medical question and method of cross validation dominated the explained variation in accuracy among studies, followed by disease category and microarray platform. In total, 42% of the between study variation was explained by all the study specific and problem specific factors that we studied together. Public Library of Science 2014-04-25 /pmc/articles/PMC4000205/ /pubmed/24770439 http://dx.doi.org/10.1371/journal.pone.0096063 Text en © 2014 Novianti et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Novianti, Putri W.
Roes, Kit C. B.
Eijkemans, Marinus J. C.
Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance
title Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance
title_full Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance
title_fullStr Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance
title_full_unstemmed Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance
title_short Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance
title_sort evaluation of gene expression classification studies: factors associated with classification performance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000205/
https://www.ncbi.nlm.nih.gov/pubmed/24770439
http://dx.doi.org/10.1371/journal.pone.0096063
work_keys_str_mv AT noviantiputriw evaluationofgeneexpressionclassificationstudiesfactorsassociatedwithclassificationperformance
AT roeskitcb evaluationofgeneexpressionclassificationstudiesfactorsassociatedwithclassificationperformance
AT eijkemansmarinusjc evaluationofgeneexpressionclassificationstudiesfactorsassociatedwithclassificationperformance