Cargando…

Factoring a 2 x 2 contingency table

We show that a two-component proportional representation provides the necessary framework to account for the properties of a 2 × 2 contingency table. This corresponds to the factorization of the table as a product of proportion and diagonal row or column sum matrices. The row and column sum invarian...

Descripción completa

Detalles Bibliográficos
Autor principal: Luck, Stanley
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6814214/
https://www.ncbi.nlm.nih.gov/pubmed/31652283
http://dx.doi.org/10.1371/journal.pone.0224460
_version_ 1783462972314615808
author Luck, Stanley
author_facet Luck, Stanley
author_sort Luck, Stanley
collection PubMed
description We show that a two-component proportional representation provides the necessary framework to account for the properties of a 2 × 2 contingency table. This corresponds to the factorization of the table as a product of proportion and diagonal row or column sum matrices. The row and column sum invariant measures for proportional variation are obtained. Geometrically, these correspond to displacements of two point vectors in the standard one-simplex, which are reduced to a center-of-mass coordinate representation, [Image: see text] . Then, effect size measures, such as the odds ratio and relative risk, correspond to different perspective functions for the mapping of (δ, μ) to [Image: see text] . Furthermore, variations in δ and μ will be associated with different cost-benefit trade-offs for a given application. Therefore, pure mathematics alone does not provide the specification of a general form for the perspective function. This implies that the question of the merits of the odds ratio versus relative risk cannot be resolved in a general way. Expressions are obtained for the marginal sum dependence and the relations between various effect size measures, including the simple matching coefficient, odds ratio, relative risk, Yule’s Q, ϕ, and Goodman and Kruskal’s τ(c|r). We also show that Gini information gain (IG(G)) is equivalent to ϕ(2) in the classification and regression tree (CART) algorithm. Then, IG(G) can yield misleading results due to the dependence on marginal sums. Monte Carlo methods facilitate the detailed specification of stochastic effects in the data acquisition process and provide a practical way to estimate the confidence interval for an effect size.
format Online
Article
Text
id pubmed-6814214
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68142142019-11-03 Factoring a 2 x 2 contingency table Luck, Stanley PLoS One Research Article We show that a two-component proportional representation provides the necessary framework to account for the properties of a 2 × 2 contingency table. This corresponds to the factorization of the table as a product of proportion and diagonal row or column sum matrices. The row and column sum invariant measures for proportional variation are obtained. Geometrically, these correspond to displacements of two point vectors in the standard one-simplex, which are reduced to a center-of-mass coordinate representation, [Image: see text] . Then, effect size measures, such as the odds ratio and relative risk, correspond to different perspective functions for the mapping of (δ, μ) to [Image: see text] . Furthermore, variations in δ and μ will be associated with different cost-benefit trade-offs for a given application. Therefore, pure mathematics alone does not provide the specification of a general form for the perspective function. This implies that the question of the merits of the odds ratio versus relative risk cannot be resolved in a general way. Expressions are obtained for the marginal sum dependence and the relations between various effect size measures, including the simple matching coefficient, odds ratio, relative risk, Yule’s Q, ϕ, and Goodman and Kruskal’s τ(c|r). We also show that Gini information gain (IG(G)) is equivalent to ϕ(2) in the classification and regression tree (CART) algorithm. Then, IG(G) can yield misleading results due to the dependence on marginal sums. Monte Carlo methods facilitate the detailed specification of stochastic effects in the data acquisition process and provide a practical way to estimate the confidence interval for an effect size. Public Library of Science 2019-10-25 /pmc/articles/PMC6814214/ /pubmed/31652283 http://dx.doi.org/10.1371/journal.pone.0224460 Text en © 2019 Stanley Luck http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Luck, Stanley
Factoring a 2 x 2 contingency table
title Factoring a 2 x 2 contingency table
title_full Factoring a 2 x 2 contingency table
title_fullStr Factoring a 2 x 2 contingency table
title_full_unstemmed Factoring a 2 x 2 contingency table
title_short Factoring a 2 x 2 contingency table
title_sort factoring a 2 x 2 contingency table
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6814214/
https://www.ncbi.nlm.nih.gov/pubmed/31652283
http://dx.doi.org/10.1371/journal.pone.0224460
work_keys_str_mv AT luckstanley factoringa2x2contingencytable