Cargando…

Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions

BACKGROUND: Statistical analysis of genome-wide microarrays can result in many thousands of identical statistical tests being performed as each probe is tested for an association with a phenotype of interest. If there were no association between any of the probes and the phenotype, the distribution...

Descripción completa

Detalles Bibliográficos
Autores principales: Barton, Sheila J, Crozier, Sarah R, Lillycrop, Karen A, Godfrey, Keith M, Inskip, Hazel M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610227/
https://www.ncbi.nlm.nih.gov/pubmed/23496791
http://dx.doi.org/10.1186/1471-2164-14-161
_version_ 1782264423276085248
author Barton, Sheila J
Crozier, Sarah R
Lillycrop, Karen A
Godfrey, Keith M
Inskip, Hazel M
author_facet Barton, Sheila J
Crozier, Sarah R
Lillycrop, Karen A
Godfrey, Keith M
Inskip, Hazel M
author_sort Barton, Sheila J
collection PubMed
description BACKGROUND: Statistical analysis of genome-wide microarrays can result in many thousands of identical statistical tests being performed as each probe is tested for an association with a phenotype of interest. If there were no association between any of the probes and the phenotype, the distribution of P values obtained from statistical tests would resemble a Uniform distribution. If a selection of probes were significantly associated with the phenotype we would expect to observe P values for these probes of less than the designated significance level, alpha, resulting in more P values of less than alpha than expected by chance. RESULTS: In data from a whole genome methylation promoter array we unexpectedly observed P value distributions where there were fewer P values less than alpha than would be expected by chance. Our data suggest that a possible reason for this is a violation of the statistical assumptions required for these tests arising from heteroskedasticity. A simple but statistically sound remedy (a heteroskedasticity–consistent covariance matrix estimator to calculate standard errors of regression coefficients that are robust to heteroskedasticity) rectified this violation and resulted in meaningful P value distributions. CONCLUSIONS: The statistical analysis of ‘omics data requires careful handling, especially in the choice of statistical test. To obtain meaningful results it is essential that the assumptions behind these tests are carefully examined and any violations rectified where possible, or a more appropriate statistical test chosen.
format Online
Article
Text
id pubmed-3610227
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36102272013-03-29 Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions Barton, Sheila J Crozier, Sarah R Lillycrop, Karen A Godfrey, Keith M Inskip, Hazel M BMC Genomics Methodology Article BACKGROUND: Statistical analysis of genome-wide microarrays can result in many thousands of identical statistical tests being performed as each probe is tested for an association with a phenotype of interest. If there were no association between any of the probes and the phenotype, the distribution of P values obtained from statistical tests would resemble a Uniform distribution. If a selection of probes were significantly associated with the phenotype we would expect to observe P values for these probes of less than the designated significance level, alpha, resulting in more P values of less than alpha than expected by chance. RESULTS: In data from a whole genome methylation promoter array we unexpectedly observed P value distributions where there were fewer P values less than alpha than would be expected by chance. Our data suggest that a possible reason for this is a violation of the statistical assumptions required for these tests arising from heteroskedasticity. A simple but statistically sound remedy (a heteroskedasticity–consistent covariance matrix estimator to calculate standard errors of regression coefficients that are robust to heteroskedasticity) rectified this violation and resulted in meaningful P value distributions. CONCLUSIONS: The statistical analysis of ‘omics data requires careful handling, especially in the choice of statistical test. To obtain meaningful results it is essential that the assumptions behind these tests are carefully examined and any violations rectified where possible, or a more appropriate statistical test chosen. BioMed Central 2013-03-11 /pmc/articles/PMC3610227/ /pubmed/23496791 http://dx.doi.org/10.1186/1471-2164-14-161 Text en Copyright ©2013 Barton et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Barton, Sheila J
Crozier, Sarah R
Lillycrop, Karen A
Godfrey, Keith M
Inskip, Hazel M
Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions
title Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions
title_full Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions
title_fullStr Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions
title_full_unstemmed Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions
title_short Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions
title_sort correction of unexpected distributions of p values from analysis of whole genome arrays by rectifying violation of statistical assumptions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3610227/
https://www.ncbi.nlm.nih.gov/pubmed/23496791
http://dx.doi.org/10.1186/1471-2164-14-161
work_keys_str_mv AT bartonsheilaj correctionofunexpecteddistributionsofpvaluesfromanalysisofwholegenomearraysbyrectifyingviolationofstatisticalassumptions
AT croziersarahr correctionofunexpecteddistributionsofpvaluesfromanalysisofwholegenomearraysbyrectifyingviolationofstatisticalassumptions
AT lillycropkarena correctionofunexpecteddistributionsofpvaluesfromanalysisofwholegenomearraysbyrectifyingviolationofstatisticalassumptions
AT godfreykeithm correctionofunexpecteddistributionsofpvaluesfromanalysisofwholegenomearraysbyrectifyingviolationofstatisticalassumptions
AT inskiphazelm correctionofunexpecteddistributionsofpvaluesfromanalysisofwholegenomearraysbyrectifyingviolationofstatisticalassumptions