Cargando…
Statistically invalid classification of high throughput gene expression data
Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in hea...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551228/ https://www.ncbi.nlm.nih.gov/pubmed/23346359 http://dx.doi.org/10.1038/srep01102 |
_version_ | 1782256538381975552 |
---|---|
author | Barbash, Shahar Soreq, Hermona |
author_facet | Barbash, Shahar Soreq, Hermona |
author_sort | Barbash, Shahar |
collection | PubMed |
description | Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes. |
format | Online Article Text |
id | pubmed-3551228 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-35512282013-01-23 Statistically invalid classification of high throughput gene expression data Barbash, Shahar Soreq, Hermona Sci Rep Article Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes. Nature Publishing Group 2013-01-22 /pmc/articles/PMC3551228/ /pubmed/23346359 http://dx.doi.org/10.1038/srep01102 Text en Copyright © 2013, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-nd/3.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ |
spellingShingle | Article Barbash, Shahar Soreq, Hermona Statistically invalid classification of high throughput gene expression data |
title | Statistically invalid classification of high throughput gene expression data |
title_full | Statistically invalid classification of high throughput gene expression data |
title_fullStr | Statistically invalid classification of high throughput gene expression data |
title_full_unstemmed | Statistically invalid classification of high throughput gene expression data |
title_short | Statistically invalid classification of high throughput gene expression data |
title_sort | statistically invalid classification of high throughput gene expression data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3551228/ https://www.ncbi.nlm.nih.gov/pubmed/23346359 http://dx.doi.org/10.1038/srep01102 |
work_keys_str_mv | AT barbashshahar statisticallyinvalidclassificationofhighthroughputgeneexpressiondata AT soreqhermona statisticallyinvalidclassificationofhighthroughputgeneexpressiondata |