Cargando…

Statistically invalid classification of high throughput gene expression data

Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in hea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barbash, Shahar, Soreq, Hermona
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group 2013
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551228/ https://www.ncbi.nlm.nih.gov/pubmed/23346359 http://dx.doi.org/10.1038/srep01102

Descripción
Sumario:	Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes.

Statistically invalid classification of high throughput gene expression data

Ejemplares similares