Cargando…

Assessing Conformance with Benford’s Law: Goodness-Of-Fit Tests and Simultaneous Confidence Intervals

Benford’s Law is a probability distribution for the first significant digits of numbers, for example, the first significant digits of the numbers 871 and 0.22 are 8 and 2 respectively. The law is particularly remarkable because many types of data are considered to be consistent with Benford’s Law an...

Descripción completa

Detalles Bibliográficos
Autores principales: Lesperance, M., Reed, W. J., Stephens, M. A., Tsao, C., Wilton, B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809611/
https://www.ncbi.nlm.nih.gov/pubmed/27018999
http://dx.doi.org/10.1371/journal.pone.0151235
Descripción
Sumario:Benford’s Law is a probability distribution for the first significant digits of numbers, for example, the first significant digits of the numbers 871 and 0.22 are 8 and 2 respectively. The law is particularly remarkable because many types of data are considered to be consistent with Benford’s Law and scientists and investigators have applied it in diverse areas, for example, diagnostic tests for mathematical models in Biology, Genomics, Neuroscience, image analysis and fraud detection. In this article we present and compare statistically sound methods for assessing conformance of data with Benford’s Law, including discrete versions of Cramér-von Mises (CvM) statistical tests and simultaneous confidence intervals. We demonstrate that the common use of many binomial confidence intervals leads to rejection of Benford too often for truly Benford data. Based on our investigation, we recommend that the CvM statistic [Image: see text] , Pearson’s chi-square statistic and 100(1 − α)% Goodman’s simultaneous confidence intervals be computed when assessing conformance with Benford’s Law. Visual inspection of the data with simultaneous confidence intervals is useful for understanding departures from Benford and the influence of sample size.