Cargando…

Randomization techniques for assessing the significance of gene periodicity results

BACKGROUND: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns...

Descripción completa

Detalles Bibliográficos
Autores principales: Kallio, Aleksi, Vuokko, Niko, Ojala, Markus, Haiminen, Niina, Mannila, Heikki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199764/
https://www.ncbi.nlm.nih.gov/pubmed/21827656
http://dx.doi.org/10.1186/1471-2105-12-330
_version_ 1782214591200100352
author Kallio, Aleksi
Vuokko, Niko
Ojala, Markus
Haiminen, Niina
Mannila, Heikki
author_facet Kallio, Aleksi
Vuokko, Niko
Ojala, Markus
Haiminen, Niina
Mannila, Heikki
author_sort Kallio, Aleksi
collection PubMed
description BACKGROUND: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically. RESULTS: We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework. By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods. CONCLUSIONS: Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing.
format Online
Article
Text
id pubmed-3199764
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31997642011-10-24 Randomization techniques for assessing the significance of gene periodicity results Kallio, Aleksi Vuokko, Niko Ojala, Markus Haiminen, Niina Mannila, Heikki BMC Bioinformatics Methodology Article BACKGROUND: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically. RESULTS: We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework. By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods. CONCLUSIONS: Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing. BioMed Central 2011-08-09 /pmc/articles/PMC3199764/ /pubmed/21827656 http://dx.doi.org/10.1186/1471-2105-12-330 Text en Copyright ©2011 Kallio et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Kallio, Aleksi
Vuokko, Niko
Ojala, Markus
Haiminen, Niina
Mannila, Heikki
Randomization techniques for assessing the significance of gene periodicity results
title Randomization techniques for assessing the significance of gene periodicity results
title_full Randomization techniques for assessing the significance of gene periodicity results
title_fullStr Randomization techniques for assessing the significance of gene periodicity results
title_full_unstemmed Randomization techniques for assessing the significance of gene periodicity results
title_short Randomization techniques for assessing the significance of gene periodicity results
title_sort randomization techniques for assessing the significance of gene periodicity results
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199764/
https://www.ncbi.nlm.nih.gov/pubmed/21827656
http://dx.doi.org/10.1186/1471-2105-12-330
work_keys_str_mv AT kallioaleksi randomizationtechniquesforassessingthesignificanceofgeneperiodicityresults
AT vuokkoniko randomizationtechniquesforassessingthesignificanceofgeneperiodicityresults
AT ojalamarkus randomizationtechniquesforassessingthesignificanceofgeneperiodicityresults
AT haiminenniina randomizationtechniquesforassessingthesignificanceofgeneperiodicityresults
AT mannilaheikki randomizationtechniquesforassessingthesignificanceofgeneperiodicityresults