Cargando…

Imbalanced target prediction with pattern discovery on clinical data repositories

BACKGROUND: Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chan, Tak-Ming, Li, Yuxi, Chiau, Choo-Chiap, Zhu, Jane, Jiang, Jie, Huo, Yong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399417/ https://www.ncbi.nlm.nih.gov/pubmed/28427384 http://dx.doi.org/10.1186/s12911-017-0443-3

_version_	1783230642346000384
author	Chan, Tak-Ming Li, Yuxi Chiau, Choo-Chiap Zhu, Jane Jiang, Jie Huo, Yong
author_facet	Chan, Tak-Ming Li, Yuxi Chiau, Choo-Chiap Zhu, Jane Jiang, Jie Huo, Yong
author_sort	Chan, Tak-Ming
collection	PubMed
description	BACKGROUND: Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. METHODS: We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. RESULTS: We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. CONCLUSIONS: Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-017-0443-3) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5399417
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-53994172017-04-24 Imbalanced target prediction with pattern discovery on clinical data repositories Chan, Tak-Ming Li, Yuxi Chiau, Choo-Chiap Zhu, Jane Jiang, Jie Huo, Yong BMC Med Inform Decis Mak Research Article BACKGROUND: Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. METHODS: We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. RESULTS: We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. CONCLUSIONS: Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-017-0443-3) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-20 /pmc/articles/PMC5399417/ /pubmed/28427384 http://dx.doi.org/10.1186/s12911-017-0443-3 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Chan, Tak-Ming Li, Yuxi Chiau, Choo-Chiap Zhu, Jane Jiang, Jie Huo, Yong Imbalanced target prediction with pattern discovery on clinical data repositories
title	Imbalanced target prediction with pattern discovery on clinical data repositories
title_full	Imbalanced target prediction with pattern discovery on clinical data repositories
title_fullStr	Imbalanced target prediction with pattern discovery on clinical data repositories
title_full_unstemmed	Imbalanced target prediction with pattern discovery on clinical data repositories
title_short	Imbalanced target prediction with pattern discovery on clinical data repositories
title_sort	imbalanced target prediction with pattern discovery on clinical data repositories
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399417/ https://www.ncbi.nlm.nih.gov/pubmed/28427384 http://dx.doi.org/10.1186/s12911-017-0443-3
work_keys_str_mv	AT chantakming imbalancedtargetpredictionwithpatterndiscoveryonclinicaldatarepositories AT liyuxi imbalancedtargetpredictionwithpatterndiscoveryonclinicaldatarepositories AT chiauchoochiap imbalancedtargetpredictionwithpatterndiscoveryonclinicaldatarepositories AT zhujane imbalancedtargetpredictionwithpatterndiscoveryonclinicaldatarepositories AT jiangjie imbalancedtargetpredictionwithpatterndiscoveryonclinicaldatarepositories AT huoyong imbalancedtargetpredictionwithpatterndiscoveryonclinicaldatarepositories

Imbalanced target prediction with pattern discovery on clinical data repositories

Ejemplares similares