Cargando…

Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes

In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two import...

Descripción completa

Detalles Bibliográficos
Autores principales: Tollenaar, Nikolaj, van der Heijden, Peter G. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6407787/
https://www.ncbi.nlm.nih.gov/pubmed/30849094
http://dx.doi.org/10.1371/journal.pone.0213245
_version_ 1783401631615811584
author Tollenaar, Nikolaj
van der Heijden, Peter G. M.
author_facet Tollenaar, Nikolaj
van der Heijden, Peter G. M.
author_sort Tollenaar, Nikolaj
collection PubMed
description In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible black-box model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process.
format Online
Article
Text
id pubmed-6407787
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64077872019-03-17 Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes Tollenaar, Nikolaj van der Heijden, Peter G. M. PLoS One Research Article In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible black-box model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process. Public Library of Science 2019-03-08 /pmc/articles/PMC6407787/ /pubmed/30849094 http://dx.doi.org/10.1371/journal.pone.0213245 Text en © 2019 Tollenaar, van der Heijden http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tollenaar, Nikolaj
van der Heijden, Peter G. M.
Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
title Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
title_full Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
title_fullStr Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
title_full_unstemmed Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
title_short Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
title_sort optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6407787/
https://www.ncbi.nlm.nih.gov/pubmed/30849094
http://dx.doi.org/10.1371/journal.pone.0213245
work_keys_str_mv AT tollenaarnikolaj optimizingpredictiveperformanceofcriminalrecidivismmodelsusingregistrationdatawithbinaryandsurvivaloutcomes
AT vanderheijdenpetergm optimizingpredictiveperformanceofcriminalrecidivismmodelsusingregistrationdatawithbinaryandsurvivaloutcomes