Cargando…

Statistically invalid classification of high throughput gene expression data

Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in hea...

Descripción completa

Detalles Bibliográficos
Autores principales: Barbash, Shahar, Soreq, Hermona
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551228/
https://www.ncbi.nlm.nih.gov/pubmed/23346359
http://dx.doi.org/10.1038/srep01102
_version_ 1782256538381975552
author Barbash, Shahar
Soreq, Hermona
author_facet Barbash, Shahar
Soreq, Hermona
author_sort Barbash, Shahar
collection PubMed
description Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes.
format Online
Article
Text
id pubmed-3551228
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-35512282013-01-23 Statistically invalid classification of high throughput gene expression data Barbash, Shahar Soreq, Hermona Sci Rep Article Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes. Nature Publishing Group 2013-01-22 /pmc/articles/PMC3551228/ /pubmed/23346359 http://dx.doi.org/10.1038/srep01102 Text en Copyright © 2013, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-nd/3.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
spellingShingle Article
Barbash, Shahar
Soreq, Hermona
Statistically invalid classification of high throughput gene expression data
title Statistically invalid classification of high throughput gene expression data
title_full Statistically invalid classification of high throughput gene expression data
title_fullStr Statistically invalid classification of high throughput gene expression data
title_full_unstemmed Statistically invalid classification of high throughput gene expression data
title_short Statistically invalid classification of high throughput gene expression data
title_sort statistically invalid classification of high throughput gene expression data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551228/
https://www.ncbi.nlm.nih.gov/pubmed/23346359
http://dx.doi.org/10.1038/srep01102
work_keys_str_mv AT barbashshahar statisticallyinvalidclassificationofhighthroughputgeneexpressiondata
AT soreqhermona statisticallyinvalidclassificationofhighthroughputgeneexpressiondata