Cargando…

Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools

BACKGROUND: Online clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation. This report outlines a comprehensive data science strategy for building such tools with application to the Prostate Bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tolksdorf, Johanna, Kattan, Michael W., Boorjian, Stephen A., Freedland, Stephen J., Saba, Karim, Poyet, Cedric, Guerrios, Lourdes, De Hoedt, Amanda, Liss, Michael A., Leach, Robin J., Hernandez, Javier, Vertosick, Emily, Vickers, Andrew J., Ankerst, Donna P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792191/
https://www.ncbi.nlm.nih.gov/pubmed/31615451
http://dx.doi.org/10.1186/s12874-019-0839-0
_version_ 1783459095634771968
author Tolksdorf, Johanna
Kattan, Michael W.
Boorjian, Stephen A.
Freedland, Stephen J.
Saba, Karim
Poyet, Cedric
Guerrios, Lourdes
De Hoedt, Amanda
Liss, Michael A.
Leach, Robin J.
Hernandez, Javier
Vertosick, Emily
Vickers, Andrew J.
Ankerst, Donna P.
author_facet Tolksdorf, Johanna
Kattan, Michael W.
Boorjian, Stephen A.
Freedland, Stephen J.
Saba, Karim
Poyet, Cedric
Guerrios, Lourdes
De Hoedt, Amanda
Liss, Michael A.
Leach, Robin J.
Hernandez, Javier
Vertosick, Emily
Vickers, Andrew J.
Ankerst, Donna P.
author_sort Tolksdorf, Johanna
collection PubMed
description BACKGROUND: Online clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation. This report outlines a comprehensive data science strategy for building such tools with application to the Prostate Biopsy Collaborative Group prostate cancer risk prediction tool. METHODS: We created models for high-grade prostate cancer risk using six established risk factors. The data comprised 8492 prostate biopsies collected from ten institutions, 2 in Europe and 8 across North America. We calculated area under the receiver operating characteristic curve (AUC) for discrimination, the Hosmer-Lemeshow test statistic (HLS) for calibration and the clinical net benefit at risk threshold 15%. We implemented several internal cross-validation schemes to assess the influence of modeling method and individual cohort on validation performance. RESULTS: High-grade disease prevalence ranged from 18% in Zurich (1863 biopsies) to 39% in UT Health San Antonio (899 biopsies). Visualization revealed outliers in terms of risk factors, including San Juan VA (51% abnormal digital rectal exam), Durham VA (63% African American), and Zurich (2.8% family history). Exclusion of any cohort did not significantly affect the AUC or HLS, nor did the choice of prediction model (pooled, random-effects, meta-analysis). Excluding the lowest-prevalence Zurich cohort from training sets did not statistically significantly change the validation metrics for any of the individual cohorts, except for Sunnybrook, where the effect on the AUC was minimal. Therefore the final multivariable logistic model was built by pooling the data from all cohorts using logistic regression. Higher prostate-specific antigen and age, abnormal digital rectal exam, African ancestry and a family history of prostate cancer increased risk of high-grade prostate cancer, while a history of a prior negative prostate biopsy decreased risk (all p-values < 0.004). CONCLUSIONS: We have outlined a multi-cohort model-building internal validation strategy for developing globally accessible and scalable risk prediction tools.
format Online
Article
Text
id pubmed-6792191
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67921912019-10-21 Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools Tolksdorf, Johanna Kattan, Michael W. Boorjian, Stephen A. Freedland, Stephen J. Saba, Karim Poyet, Cedric Guerrios, Lourdes De Hoedt, Amanda Liss, Michael A. Leach, Robin J. Hernandez, Javier Vertosick, Emily Vickers, Andrew J. Ankerst, Donna P. BMC Med Res Methodol Research Article BACKGROUND: Online clinical risk prediction tools built on data from multiple cohorts are increasingly being utilized for contemporary doctor-patient decision-making and validation. This report outlines a comprehensive data science strategy for building such tools with application to the Prostate Biopsy Collaborative Group prostate cancer risk prediction tool. METHODS: We created models for high-grade prostate cancer risk using six established risk factors. The data comprised 8492 prostate biopsies collected from ten institutions, 2 in Europe and 8 across North America. We calculated area under the receiver operating characteristic curve (AUC) for discrimination, the Hosmer-Lemeshow test statistic (HLS) for calibration and the clinical net benefit at risk threshold 15%. We implemented several internal cross-validation schemes to assess the influence of modeling method and individual cohort on validation performance. RESULTS: High-grade disease prevalence ranged from 18% in Zurich (1863 biopsies) to 39% in UT Health San Antonio (899 biopsies). Visualization revealed outliers in terms of risk factors, including San Juan VA (51% abnormal digital rectal exam), Durham VA (63% African American), and Zurich (2.8% family history). Exclusion of any cohort did not significantly affect the AUC or HLS, nor did the choice of prediction model (pooled, random-effects, meta-analysis). Excluding the lowest-prevalence Zurich cohort from training sets did not statistically significantly change the validation metrics for any of the individual cohorts, except for Sunnybrook, where the effect on the AUC was minimal. Therefore the final multivariable logistic model was built by pooling the data from all cohorts using logistic regression. Higher prostate-specific antigen and age, abnormal digital rectal exam, African ancestry and a family history of prostate cancer increased risk of high-grade prostate cancer, while a history of a prior negative prostate biopsy decreased risk (all p-values < 0.004). CONCLUSIONS: We have outlined a multi-cohort model-building internal validation strategy for developing globally accessible and scalable risk prediction tools. BioMed Central 2019-10-15 /pmc/articles/PMC6792191/ /pubmed/31615451 http://dx.doi.org/10.1186/s12874-019-0839-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Tolksdorf, Johanna
Kattan, Michael W.
Boorjian, Stephen A.
Freedland, Stephen J.
Saba, Karim
Poyet, Cedric
Guerrios, Lourdes
De Hoedt, Amanda
Liss, Michael A.
Leach, Robin J.
Hernandez, Javier
Vertosick, Emily
Vickers, Andrew J.
Ankerst, Donna P.
Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
title Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
title_full Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
title_fullStr Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
title_full_unstemmed Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
title_short Multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
title_sort multi-cohort modeling strategies for scalable globally accessible prostate cancer risk tools
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792191/
https://www.ncbi.nlm.nih.gov/pubmed/31615451
http://dx.doi.org/10.1186/s12874-019-0839-0
work_keys_str_mv AT tolksdorfjohanna multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT kattanmichaelw multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT boorjianstephena multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT freedlandstephenj multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT sabakarim multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT poyetcedric multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT guerrioslourdes multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT dehoedtamanda multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT lissmichaela multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT leachrobinj multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT hernandezjavier multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT vertosickemily multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT vickersandrewj multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools
AT ankerstdonnap multicohortmodelingstrategiesforscalablegloballyaccessibleprostatecancerrisktools