Cargando…

Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys

BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns direc...

Descripción completa

Detalles Bibliográficos
Autores principales: Meyerson, William U, Fineberg, Sarah K, Song, Ye Kyung, Faber, Adam, Ash, Garrett, Andrade, Fernanda C, Corlett, Philip, Gerstein, Mark B, Hoyle, Rick H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890352/
https://www.ncbi.nlm.nih.gov/pubmed/36649054
http://dx.doi.org/10.2196/38112
_version_ 1784880931020996608
author Meyerson, William U
Fineberg, Sarah K
Song, Ye Kyung
Faber, Adam
Ash, Garrett
Andrade, Fernanda C
Corlett, Philip
Gerstein, Mark B
Hoyle, Rick H
author_facet Meyerson, William U
Fineberg, Sarah K
Song, Ye Kyung
Faber, Adam
Ash, Garrett
Andrade, Fernanda C
Corlett, Philip
Gerstein, Mark B
Hoyle, Rick H
author_sort Meyerson, William U
collection PubMed
description BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns directly from social media data. While innovative, these efforts are variously unscalable, context dependent, confined to specific sleep parameters, or rest on untested assumptions, and none of the reviewed studies apply to the popular Reddit platform or release software to the research community. OBJECTIVE: This study builds on this prior work. We estimate the bedtimes of Reddit users from the times tamps of their posts, test inference validity against survey data, and release our model as an R package (The R Foundation). METHODS: We included 159 sufficiently active Reddit users with known time zones and known, nonanomalous bedtimes, together with the time stamps of their 2.1 million posts. The model’s form was chosen by visualizing the aggregate distribution of the timing of users’ posts relative to their reported bedtimes. The chosen model represents a user’s frequency of Reddit posting by time of day, with a flat portion before bedtime and a quadratic depletion that begins near the user’s bedtime, with parameters fitted to the data. This model estimates the bedtimes of individual Reddit users from the time stamps of their posts. Model performance is assessed through k-fold cross-validation. We then apply the model to estimate the bedtimes of 51,372 sufficiently active, nonbot Reddit users with known time zones from the time stamps of their 140 million posts. RESULTS: The Pearson correlation between expected and observed Reddit posting frequencies in our model was 0.997 on aggregate data. On average, posting starts declining 45 minutes before bedtime, reaches a nadir 4.75 hours after bedtime that is 87% lower than the daytime rate, and returns to baseline 10.25 hours after bedtime. The Pearson correlation between inferred and reported bedtimes for individual users was 0.61 (P<.001). In 90 of 159 cases (56.6%), our estimate was within 1 hour of the reported bedtime; 128 cases (80.5%) were within 2 hours. There was equivalent accuracy in hold-out sets versus training sets of k-fold cross-validation, arguing against overfitting. The model was more accurate than a random forest approach. CONCLUSIONS: We uncovered a simple, reproducible relationship between Reddit users’ reported bedtimes and the time of day when high daytime posting rates transition to low nighttime posting rates. We captured this relationship in a model that estimates users’ bedtimes from the time stamps of their posts. Limitations include applicability only to users who post frequently, the requirement for time zone data, and limits on generalizability. Nonetheless, it is a step forward for inferring the sleep parameters of social media users passively at scale. Our model and precomputed estimated bedtimes of 50,000 Reddit users are freely available.
format Online
Article
Text
id pubmed-9890352
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-98903522023-02-02 Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys Meyerson, William U Fineberg, Sarah K Song, Ye Kyung Faber, Adam Ash, Garrett Andrade, Fernanda C Corlett, Philip Gerstein, Mark B Hoyle, Rick H JMIR Form Res Original Paper BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns directly from social media data. While innovative, these efforts are variously unscalable, context dependent, confined to specific sleep parameters, or rest on untested assumptions, and none of the reviewed studies apply to the popular Reddit platform or release software to the research community. OBJECTIVE: This study builds on this prior work. We estimate the bedtimes of Reddit users from the times tamps of their posts, test inference validity against survey data, and release our model as an R package (The R Foundation). METHODS: We included 159 sufficiently active Reddit users with known time zones and known, nonanomalous bedtimes, together with the time stamps of their 2.1 million posts. The model’s form was chosen by visualizing the aggregate distribution of the timing of users’ posts relative to their reported bedtimes. The chosen model represents a user’s frequency of Reddit posting by time of day, with a flat portion before bedtime and a quadratic depletion that begins near the user’s bedtime, with parameters fitted to the data. This model estimates the bedtimes of individual Reddit users from the time stamps of their posts. Model performance is assessed through k-fold cross-validation. We then apply the model to estimate the bedtimes of 51,372 sufficiently active, nonbot Reddit users with known time zones from the time stamps of their 140 million posts. RESULTS: The Pearson correlation between expected and observed Reddit posting frequencies in our model was 0.997 on aggregate data. On average, posting starts declining 45 minutes before bedtime, reaches a nadir 4.75 hours after bedtime that is 87% lower than the daytime rate, and returns to baseline 10.25 hours after bedtime. The Pearson correlation between inferred and reported bedtimes for individual users was 0.61 (P<.001). In 90 of 159 cases (56.6%), our estimate was within 1 hour of the reported bedtime; 128 cases (80.5%) were within 2 hours. There was equivalent accuracy in hold-out sets versus training sets of k-fold cross-validation, arguing against overfitting. The model was more accurate than a random forest approach. CONCLUSIONS: We uncovered a simple, reproducible relationship between Reddit users’ reported bedtimes and the time of day when high daytime posting rates transition to low nighttime posting rates. We captured this relationship in a model that estimates users’ bedtimes from the time stamps of their posts. Limitations include applicability only to users who post frequently, the requirement for time zone data, and limits on generalizability. Nonetheless, it is a step forward for inferring the sleep parameters of social media users passively at scale. Our model and precomputed estimated bedtimes of 50,000 Reddit users are freely available. JMIR Publications 2023-01-17 /pmc/articles/PMC9890352/ /pubmed/36649054 http://dx.doi.org/10.2196/38112 Text en ©William U Meyerson, Sarah K Fineberg, Ye Kyung Song, Adam Faber, Garrett Ash, Fernanda C Andrade, Philip Corlett, Mark B Gerstein, Rick H Hoyle. Originally published in JMIR Formative Research (https://formative.jmir.org), 17.01.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Meyerson, William U
Fineberg, Sarah K
Song, Ye Kyung
Faber, Adam
Ash, Garrett
Andrade, Fernanda C
Corlett, Philip
Gerstein, Mark B
Hoyle, Rick H
Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
title Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
title_full Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
title_fullStr Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
title_full_unstemmed Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
title_short Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
title_sort estimation of bedtimes of reddit users: integrated analysis of time stamps and surveys
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890352/
https://www.ncbi.nlm.nih.gov/pubmed/36649054
http://dx.doi.org/10.2196/38112
work_keys_str_mv AT meyersonwilliamu estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT finebergsarahk estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT songyekyung estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT faberadam estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT ashgarrett estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT andradefernandac estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT corlettphilip estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT gersteinmarkb estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys
AT hoylerickh estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys