Cargando…

An evaluation of time series summary statistics as features for clinical prediction tasks

BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series....

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Chonghui, Lu, Menglin, Chen, Jingfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059727/
https://www.ncbi.nlm.nih.gov/pubmed/32138733
http://dx.doi.org/10.1186/s12911-020-1063-x
_version_ 1783504108844482560
author Guo, Chonghui
Lu, Menglin
Chen, Jingfeng
author_facet Guo, Chonghui
Lu, Menglin
Chen, Jingfeng
author_sort Guo, Chonghui
collection PubMed
description BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect.
format Online
Article
Text
id pubmed-7059727
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70597272020-03-12 An evaluation of time series summary statistics as features for clinical prediction tasks Guo, Chonghui Lu, Menglin Chen, Jingfeng BMC Med Inform Decis Mak Research Article BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect. BioMed Central 2020-03-05 /pmc/articles/PMC7059727/ /pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Guo, Chonghui
Lu, Menglin
Chen, Jingfeng
An evaluation of time series summary statistics as features for clinical prediction tasks
title An evaluation of time series summary statistics as features for clinical prediction tasks
title_full An evaluation of time series summary statistics as features for clinical prediction tasks
title_fullStr An evaluation of time series summary statistics as features for clinical prediction tasks
title_full_unstemmed An evaluation of time series summary statistics as features for clinical prediction tasks
title_short An evaluation of time series summary statistics as features for clinical prediction tasks
title_sort evaluation of time series summary statistics as features for clinical prediction tasks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059727/
https://www.ncbi.nlm.nih.gov/pubmed/32138733
http://dx.doi.org/10.1186/s12911-020-1063-x
work_keys_str_mv AT guochonghui anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks
AT lumenglin anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks
AT chenjingfeng anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks
AT guochonghui evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks
AT lumenglin evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks
AT chenjingfeng evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks