Cargando…

The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies

BACKGROUND: Data mining tools have been increasingly used in health research, with the promise of accelerating discoveries. Lift is a standard association metric in the data mining community. However, health researchers struggle with the interpretation of lift. As a result, dissemination of data min...

Descripción completa

Detalles Bibliográficos
Autores principales: Vu, Khanh, Clark, Rebecca A., Bellinger, Colin, Erickson, Graham, Osornio-Vargas, Alvaro, Zaïane, Osmar R., Yuan, Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580490/
https://www.ncbi.nlm.nih.gov/pubmed/31208407
http://dx.doi.org/10.1186/s12911-019-0838-4
_version_ 1783428030265294848
author Vu, Khanh
Clark, Rebecca A.
Bellinger, Colin
Erickson, Graham
Osornio-Vargas, Alvaro
Zaïane, Osmar R.
Yuan, Yan
author_facet Vu, Khanh
Clark, Rebecca A.
Bellinger, Colin
Erickson, Graham
Osornio-Vargas, Alvaro
Zaïane, Osmar R.
Yuan, Yan
author_sort Vu, Khanh
collection PubMed
description BACKGROUND: Data mining tools have been increasingly used in health research, with the promise of accelerating discoveries. Lift is a standard association metric in the data mining community. However, health researchers struggle with the interpretation of lift. As a result, dissemination of data mining results can be met with hesitation. The relative risk and odds ratio are standard association measures in the health domain, due to their straightforward interpretation and comparability across populations. We aimed to investigate the lift-relative risk and the lift-odds ratio relationships, and provide tools to convert lift to the relative risk and odds ratio. METHODS: We derived equations linking lift-relative risk and lift-odds ratio. We discussed how lift, relative risk, and odds ratio behave numerically with varying association strengths and exposure prevalence levels. The lift-relative risk relationship was further illustrated using a high-dimensional dataset which examines the association of exposure to airborne pollutants and adverse birth outcomes. We conducted spatial association rule mining using the Kingfisher algorithm, which identified association rules using its built-in lift metric. We directly estimated relative risks and odds ratios from 2 by 2 tables for each identified rule. These values were compared to the corresponding lift values, and relative risks and odds ratios were computed using the derived equations. RESULTS: As the exposure-outcome association strengthens, the odds ratio and relative risk move away from 1 faster numerically than lift, i.e. |log (odds ratio)| ≥ |log (relative risk)| ≥ |log (lift)|. In addition, lift is bounded by the smaller of the inverse probability of outcome or exposure, i.e. lift≤ min (1/P(O), 1/P(E)). Unlike the relative risk and odds ratio, lift depends on the exposure prevalence for fixed outcomes. For example, when an exposure A and a less prevalent exposure B have the same relative risk for an outcome, exposure A has a lower lift than B. CONCLUSIONS: Lift, relative risk, and odds ratio are positively correlated and share the same null value. However, lift depends on the exposure prevalence, and thus is not straightforward to interpret or to use to compare association strength. Tools are provided to obtain the relative risk and odds ratio from lift. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0838-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6580490
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65804902019-06-24 The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies Vu, Khanh Clark, Rebecca A. Bellinger, Colin Erickson, Graham Osornio-Vargas, Alvaro Zaïane, Osmar R. Yuan, Yan BMC Med Inform Decis Mak Research Article BACKGROUND: Data mining tools have been increasingly used in health research, with the promise of accelerating discoveries. Lift is a standard association metric in the data mining community. However, health researchers struggle with the interpretation of lift. As a result, dissemination of data mining results can be met with hesitation. The relative risk and odds ratio are standard association measures in the health domain, due to their straightforward interpretation and comparability across populations. We aimed to investigate the lift-relative risk and the lift-odds ratio relationships, and provide tools to convert lift to the relative risk and odds ratio. METHODS: We derived equations linking lift-relative risk and lift-odds ratio. We discussed how lift, relative risk, and odds ratio behave numerically with varying association strengths and exposure prevalence levels. The lift-relative risk relationship was further illustrated using a high-dimensional dataset which examines the association of exposure to airborne pollutants and adverse birth outcomes. We conducted spatial association rule mining using the Kingfisher algorithm, which identified association rules using its built-in lift metric. We directly estimated relative risks and odds ratios from 2 by 2 tables for each identified rule. These values were compared to the corresponding lift values, and relative risks and odds ratios were computed using the derived equations. RESULTS: As the exposure-outcome association strengthens, the odds ratio and relative risk move away from 1 faster numerically than lift, i.e. |log (odds ratio)| ≥ |log (relative risk)| ≥ |log (lift)|. In addition, lift is bounded by the smaller of the inverse probability of outcome or exposure, i.e. lift≤ min (1/P(O), 1/P(E)). Unlike the relative risk and odds ratio, lift depends on the exposure prevalence for fixed outcomes. For example, when an exposure A and a less prevalent exposure B have the same relative risk for an outcome, exposure A has a lower lift than B. CONCLUSIONS: Lift, relative risk, and odds ratio are positively correlated and share the same null value. However, lift depends on the exposure prevalence, and thus is not straightforward to interpret or to use to compare association strength. Tools are provided to obtain the relative risk and odds ratio from lift. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0838-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-17 /pmc/articles/PMC6580490/ /pubmed/31208407 http://dx.doi.org/10.1186/s12911-019-0838-4 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Vu, Khanh
Clark, Rebecca A.
Bellinger, Colin
Erickson, Graham
Osornio-Vargas, Alvaro
Zaïane, Osmar R.
Yuan, Yan
The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
title The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
title_full The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
title_fullStr The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
title_full_unstemmed The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
title_short The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
title_sort index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6580490/
https://www.ncbi.nlm.nih.gov/pubmed/31208407
http://dx.doi.org/10.1186/s12911-019-0838-4
work_keys_str_mv AT vukhanh theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT clarkrebeccaa theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT bellingercolin theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT ericksongraham theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT osorniovargasalvaro theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT zaianeosmarr theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT yuanyan theindexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT vukhanh indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT clarkrebeccaa indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT bellingercolin indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT ericksongraham indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT osorniovargasalvaro indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT zaianeosmarr indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies
AT yuanyan indexliftindatamininghasacloserelationshipwiththeassociationmeasurerelativeriskinepidemiologicalstudies