Cargando…

Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data

Measurement errors commonly occur in 24-h hormonal data and may affect the outcomes of such studies. Measurement errors often appear as outliers in such data sets; however, no well-established method is available for their automatic detection. In this study, we aimed to compare performances of diffe...

Descripción completa

Detalles Bibliográficos
Autores principales:	van der Spoel, Evie, Choi, Jungyeon, Roelfsema, Ferdinand, le Cessie, Saskia, van Heemst, Diana, Dekkers, Olaf M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2019
Materias:	JBR Perspectives on Data Analysis
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637814/ https://www.ncbi.nlm.nih.gov/pubmed/31187683 http://dx.doi.org/10.1177/0748730419850917

_version_	1783436316515500032
author	van der Spoel, Evie Choi, Jungyeon Roelfsema, Ferdinand le Cessie, Saskia van Heemst, Diana Dekkers, Olaf M.
author_facet	van der Spoel, Evie Choi, Jungyeon Roelfsema, Ferdinand le Cessie, Saskia van Heemst, Diana Dekkers, Olaf M.
author_sort	van der Spoel, Evie
collection	PubMed
description	Measurement errors commonly occur in 24-h hormonal data and may affect the outcomes of such studies. Measurement errors often appear as outliers in such data sets; however, no well-established method is available for their automatic detection. In this study, we aimed to compare performances of different methods for outlier detection in hormonal serial data. Hormones (glucose, insulin, thyroid-stimulating hormone, cortisol, and growth hormone) were measured in blood sampled every 10 min for 24 h in 38 participants of the Leiden Longevity Study. Four methods for detecting outliers were compared: (1) eyeballing, (2) Tukey’s fences, (3) stepwise approach, and (4) the expectation-maximization (EM) algorithm. Eyeballing detects outliers based on experts’ knowledge, and the stepwise approach incorporates physiological knowledge with a statistical algorithm. Tukey’s fences and the EM algorithm are data-driven methods, using interquartile range and a mathematical algorithm to identify the underlying distribution, respectively. The performance of the methods was evaluated based on the number of outliers detected and the change in statistical outcomes after removing detected outliers. Eyeballing resulted in the lowest number of outliers detected (1.0% of all data points), followed by Tukey’s fences (2.3%), the stepwise approach (2.7%), and the EM algorithm (11.0%). In all methods, the mean hormone levels did not change materially after removing outliers. However, their minima were affected by outlier removal. Although removing outliers affected the correlation between glucose and insulin on the individual level, when averaged over all participants, none of the 4 methods influenced the correlation. Based on our results, the EM algorithm is not recommended given the high number of outliers detected, even where data points are physiologically plausible. Since Tukey’s fences is not suitable for all types of data and eyeballing is time-consuming, we recommend the stepwise approach for outlier detection, which combines physiological knowledge and an automated process.
format	Online Article Text
id	pubmed-6637814
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-66378142019-08-22 Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data van der Spoel, Evie Choi, Jungyeon Roelfsema, Ferdinand le Cessie, Saskia van Heemst, Diana Dekkers, Olaf M. J Biol Rhythms JBR Perspectives on Data Analysis Measurement errors commonly occur in 24-h hormonal data and may affect the outcomes of such studies. Measurement errors often appear as outliers in such data sets; however, no well-established method is available for their automatic detection. In this study, we aimed to compare performances of different methods for outlier detection in hormonal serial data. Hormones (glucose, insulin, thyroid-stimulating hormone, cortisol, and growth hormone) were measured in blood sampled every 10 min for 24 h in 38 participants of the Leiden Longevity Study. Four methods for detecting outliers were compared: (1) eyeballing, (2) Tukey’s fences, (3) stepwise approach, and (4) the expectation-maximization (EM) algorithm. Eyeballing detects outliers based on experts’ knowledge, and the stepwise approach incorporates physiological knowledge with a statistical algorithm. Tukey’s fences and the EM algorithm are data-driven methods, using interquartile range and a mathematical algorithm to identify the underlying distribution, respectively. The performance of the methods was evaluated based on the number of outliers detected and the change in statistical outcomes after removing detected outliers. Eyeballing resulted in the lowest number of outliers detected (1.0% of all data points), followed by Tukey’s fences (2.3%), the stepwise approach (2.7%), and the EM algorithm (11.0%). In all methods, the mean hormone levels did not change materially after removing outliers. However, their minima were affected by outlier removal. Although removing outliers affected the correlation between glucose and insulin on the individual level, when averaged over all participants, none of the 4 methods influenced the correlation. Based on our results, the EM algorithm is not recommended given the high number of outliers detected, even where data points are physiologically plausible. Since Tukey’s fences is not suitable for all types of data and eyeballing is time-consuming, we recommend the stepwise approach for outlier detection, which combines physiological knowledge and an automated process. SAGE Publications 2019-06-12 2019-08 /pmc/articles/PMC6637814/ /pubmed/31187683 http://dx.doi.org/10.1177/0748730419850917 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by/4.0/ This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	JBR Perspectives on Data Analysis van der Spoel, Evie Choi, Jungyeon Roelfsema, Ferdinand le Cessie, Saskia van Heemst, Diana Dekkers, Olaf M. Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data
title	Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data
title_full	Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data
title_fullStr	Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data
title_full_unstemmed	Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data
title_short	Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data
title_sort	comparing methods for measurement error detection in serial 24-h hormonal data
topic	JBR Perspectives on Data Analysis
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637814/ https://www.ncbi.nlm.nih.gov/pubmed/31187683 http://dx.doi.org/10.1177/0748730419850917
work_keys_str_mv	AT vanderspoelevie comparingmethodsformeasurementerrordetectioninserial24hhormonaldata AT choijungyeon comparingmethodsformeasurementerrordetectioninserial24hhormonaldata AT roelfsemaferdinand comparingmethodsformeasurementerrordetectioninserial24hhormonaldata AT lecessiesaskia comparingmethodsformeasurementerrordetectioninserial24hhormonaldata AT vanheemstdiana comparingmethodsformeasurementerrordetectioninserial24hhormonaldata AT dekkersolafm comparingmethodsformeasurementerrordetectioninserial24hhormonaldata

Comparing Methods for Measurement Error Detection in Serial 24-h Hormonal Data

Ejemplares similares