Cargando…

An evaluation of time series summary statistics as features for clinical prediction tasks

BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series....

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Chonghui, Lu, Menglin, Chen, Jingfeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059727/ https://www.ncbi.nlm.nih.gov/pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x

_version_	1783504108844482560
author	Guo, Chonghui Lu, Menglin Chen, Jingfeng
author_facet	Guo, Chonghui Lu, Menglin Chen, Jingfeng
author_sort	Guo, Chonghui
collection	PubMed
description	BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect.
format	Online Article Text
id	pubmed-7059727
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-70597272020-03-12 An evaluation of time series summary statistics as features for clinical prediction tasks Guo, Chonghui Lu, Menglin Chen, Jingfeng BMC Med Inform Decis Mak Research Article BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect. BioMed Central 2020-03-05 /pmc/articles/PMC7059727/ /pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Guo, Chonghui Lu, Menglin Chen, Jingfeng An evaluation of time series summary statistics as features for clinical prediction tasks
title	An evaluation of time series summary statistics as features for clinical prediction tasks
title_full	An evaluation of time series summary statistics as features for clinical prediction tasks
title_fullStr	An evaluation of time series summary statistics as features for clinical prediction tasks
title_full_unstemmed	An evaluation of time series summary statistics as features for clinical prediction tasks
title_short	An evaluation of time series summary statistics as features for clinical prediction tasks
title_sort	evaluation of time series summary statistics as features for clinical prediction tasks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059727/ https://www.ncbi.nlm.nih.gov/pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x
work_keys_str_mv	AT guochonghui anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT lumenglin anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT chenjingfeng anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT guochonghui evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT lumenglin evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT chenjingfeng evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks

An evaluation of time series summary statistics as features for clinical prediction tasks

Ejemplares similares