Cargando…

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

BACKGROUND: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been...

Descripción completa

Detalles Bibliográficos
Autores principales: Novak, Jaroslav P, Kim, Seon-Young, Xu, Jun, Modlich, Olga, Volsky, David J, Honys, David, Slonczewski, Joan L, Bell, Douglas A, Blattner, Fred R, Blumwald, Eduardo, Boerma, Marjan, Cosio, Manuel, Gatalica, Zoran, Hajduch, Marian, Hidalgo, Juan, McInnes, Roderick R, Miller III, Merrill C, Penkowa, Milena, Rolph, Michael S, Sottosanto, Jordan, St-Arnaud, Rene, Szego, Michael J, Twell, David, Wang, Charles
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1586001/
https://www.ncbi.nlm.nih.gov/pubmed/16959036
http://dx.doi.org/10.1186/1745-6150-1-27
_version_ 1782130339889545216
author Novak, Jaroslav P
Kim, Seon-Young
Xu, Jun
Modlich, Olga
Volsky, David J
Honys, David
Slonczewski, Joan L
Bell, Douglas A
Blattner, Fred R
Blumwald, Eduardo
Boerma, Marjan
Cosio, Manuel
Gatalica, Zoran
Hajduch, Marian
Hidalgo, Juan
McInnes, Roderick R
Miller III, Merrill C
Penkowa, Milena
Rolph, Michael S
Sottosanto, Jordan
St-Arnaud, Rene
Szego, Michael J
Twell, David
Wang, Charles
author_facet Novak, Jaroslav P
Kim, Seon-Young
Xu, Jun
Modlich, Olga
Volsky, David J
Honys, David
Slonczewski, Joan L
Bell, Douglas A
Blattner, Fred R
Blumwald, Eduardo
Boerma, Marjan
Cosio, Manuel
Gatalica, Zoran
Hajduch, Marian
Hidalgo, Juan
McInnes, Roderick R
Miller III, Merrill C
Penkowa, Milena
Rolph, Michael S
Sottosanto, Jordan
St-Arnaud, Rene
Szego, Michael J
Twell, David
Wang, Charles
author_sort Novak, Jaroslav P
collection PubMed
description BACKGROUND: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. RESULTS: Here we examine the expression data obtained from 682 Affymetrix GeneChips(® )with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. CONCLUSION: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the K(α )coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the K(α )distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance. REVIEWERS: This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).
format Text
id pubmed-1586001
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15860012006-10-02 Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution Novak, Jaroslav P Kim, Seon-Young Xu, Jun Modlich, Olga Volsky, David J Honys, David Slonczewski, Joan L Bell, Douglas A Blattner, Fred R Blumwald, Eduardo Boerma, Marjan Cosio, Manuel Gatalica, Zoran Hajduch, Marian Hidalgo, Juan McInnes, Roderick R Miller III, Merrill C Penkowa, Milena Rolph, Michael S Sottosanto, Jordan St-Arnaud, Rene Szego, Michael J Twell, David Wang, Charles Biol Direct Research BACKGROUND: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. RESULTS: Here we examine the expression data obtained from 682 Affymetrix GeneChips(® )with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. CONCLUSION: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the K(α )coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the K(α )distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance. REVIEWERS: This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser). BioMed Central 2006-09-07 /pmc/articles/PMC1586001/ /pubmed/16959036 http://dx.doi.org/10.1186/1745-6150-1-27 Text en Copyright © 2006 Novak et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Novak, Jaroslav P
Kim, Seon-Young
Xu, Jun
Modlich, Olga
Volsky, David J
Honys, David
Slonczewski, Joan L
Bell, Douglas A
Blattner, Fred R
Blumwald, Eduardo
Boerma, Marjan
Cosio, Manuel
Gatalica, Zoran
Hajduch, Marian
Hidalgo, Juan
McInnes, Roderick R
Miller III, Merrill C
Penkowa, Milena
Rolph, Michael S
Sottosanto, Jordan
St-Arnaud, Rene
Szego, Michael J
Twell, David
Wang, Charles
Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
title Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
title_full Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
title_fullStr Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
title_full_unstemmed Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
title_short Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution
title_sort generalization of dna microarray dispersion properties: microarray equivalent of t-distribution
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1586001/
https://www.ncbi.nlm.nih.gov/pubmed/16959036
http://dx.doi.org/10.1186/1745-6150-1-27
work_keys_str_mv AT novakjaroslavp generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT kimseonyoung generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT xujun generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT modlicholga generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT volskydavidj generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT honysdavid generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT slonczewskijoanl generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT belldouglasa generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT blattnerfredr generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT blumwaldeduardo generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT boermamarjan generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT cosiomanuel generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT gatalicazoran generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT hajduchmarian generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT hidalgojuan generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT mcinnesroderickr generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT milleriiimerrillc generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT penkowamilena generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT rolphmichaels generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT sottosantojordan generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT starnaudrene generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT szegomichaelj generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT twelldavid generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution
AT wangcharles generalizationofdnamicroarraydispersionpropertiesmicroarrayequivalentoftdistribution