Cargando…
An evaluation of time series summary statistics as features for clinical prediction tasks
BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059727/ https://www.ncbi.nlm.nih.gov/pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x |
_version_ | 1783504108844482560 |
---|---|
author | Guo, Chonghui Lu, Menglin Chen, Jingfeng |
author_facet | Guo, Chonghui Lu, Menglin Chen, Jingfeng |
author_sort | Guo, Chonghui |
collection | PubMed |
description | BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect. |
format | Online Article Text |
id | pubmed-7059727 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70597272020-03-12 An evaluation of time series summary statistics as features for clinical prediction tasks Guo, Chonghui Lu, Menglin Chen, Jingfeng BMC Med Inform Decis Mak Research Article BACKGROUND: Clinical prediction tasks such as patient mortality, length of hospital stay, and disease diagnosis are highly important in critical care research. The existing studies for clinical prediction mainly used simple summary statistics to summarize information from physiological time series. However, this lack of statistics leads to a lack of information. In addition, using only maximum and minimum statistics to indicate patient features fails to provide an adequate explanation. Few studies have evaluated which summary statistics best represent physiological time series. METHODS: In this paper, we summarize 14 statistics describing the characteristics of physiological time series, including the central tendency, dispersion tendency, and distribution shape. Then, we evaluate the use of summary statistics of physiological time series as features for three clinical prediction tasks. To find the combinations of statistics that yield the best performances under different tasks, we use a cross-validation-based genetic algorithm to approximate the optimal statistical combination. RESULTS: By experiments using the EHRs of 6,927 patients, we obtained prediction results based on both single statistics and commonly used combinations of statistics under three clinical prediction tasks. Based on the results of an embedded cross-validation genetic algorithm, we obtained 25 optimal sets of statistical combinations and then tested their prediction results. By comparing the performances of prediction with single statistics and commonly used combinations of statistics with quantitative analyses of the optimal statistical combinations, we found that some statistics play central roles in patient representation and different prediction tasks have certain commonalities. CONCLUSION: Through an in-depth analysis of the results, we found many practical reference points that can provide guidance for subsequent related research. Statistics that indicate dispersion tendency, such as min, max, and range, are more suitable for length of stay prediction tasks, and they also provide information for short-term mortality prediction. Mean and quantiles that reflect the central tendency of physiological time series are more suitable for mortality and disease prediction. Skewness and kurtosis perform poorly when used separately for prediction but can be used as supplementary statistics to improve the overall prediction effect. BioMed Central 2020-03-05 /pmc/articles/PMC7059727/ /pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Guo, Chonghui Lu, Menglin Chen, Jingfeng An evaluation of time series summary statistics as features for clinical prediction tasks |
title | An evaluation of time series summary statistics as features for clinical prediction tasks |
title_full | An evaluation of time series summary statistics as features for clinical prediction tasks |
title_fullStr | An evaluation of time series summary statistics as features for clinical prediction tasks |
title_full_unstemmed | An evaluation of time series summary statistics as features for clinical prediction tasks |
title_short | An evaluation of time series summary statistics as features for clinical prediction tasks |
title_sort | evaluation of time series summary statistics as features for clinical prediction tasks |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059727/ https://www.ncbi.nlm.nih.gov/pubmed/32138733 http://dx.doi.org/10.1186/s12911-020-1063-x |
work_keys_str_mv | AT guochonghui anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT lumenglin anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT chenjingfeng anevaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT guochonghui evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT lumenglin evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks AT chenjingfeng evaluationoftimeseriessummarystatisticsasfeaturesforclinicalpredictiontasks |