Cargando…
Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys
BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns direc...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890352/ https://www.ncbi.nlm.nih.gov/pubmed/36649054 http://dx.doi.org/10.2196/38112 |
_version_ | 1784880931020996608 |
---|---|
author | Meyerson, William U Fineberg, Sarah K Song, Ye Kyung Faber, Adam Ash, Garrett Andrade, Fernanda C Corlett, Philip Gerstein, Mark B Hoyle, Rick H |
author_facet | Meyerson, William U Fineberg, Sarah K Song, Ye Kyung Faber, Adam Ash, Garrett Andrade, Fernanda C Corlett, Philip Gerstein, Mark B Hoyle, Rick H |
author_sort | Meyerson, William U |
collection | PubMed |
description | BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns directly from social media data. While innovative, these efforts are variously unscalable, context dependent, confined to specific sleep parameters, or rest on untested assumptions, and none of the reviewed studies apply to the popular Reddit platform or release software to the research community. OBJECTIVE: This study builds on this prior work. We estimate the bedtimes of Reddit users from the times tamps of their posts, test inference validity against survey data, and release our model as an R package (The R Foundation). METHODS: We included 159 sufficiently active Reddit users with known time zones and known, nonanomalous bedtimes, together with the time stamps of their 2.1 million posts. The model’s form was chosen by visualizing the aggregate distribution of the timing of users’ posts relative to their reported bedtimes. The chosen model represents a user’s frequency of Reddit posting by time of day, with a flat portion before bedtime and a quadratic depletion that begins near the user’s bedtime, with parameters fitted to the data. This model estimates the bedtimes of individual Reddit users from the time stamps of their posts. Model performance is assessed through k-fold cross-validation. We then apply the model to estimate the bedtimes of 51,372 sufficiently active, nonbot Reddit users with known time zones from the time stamps of their 140 million posts. RESULTS: The Pearson correlation between expected and observed Reddit posting frequencies in our model was 0.997 on aggregate data. On average, posting starts declining 45 minutes before bedtime, reaches a nadir 4.75 hours after bedtime that is 87% lower than the daytime rate, and returns to baseline 10.25 hours after bedtime. The Pearson correlation between inferred and reported bedtimes for individual users was 0.61 (P<.001). In 90 of 159 cases (56.6%), our estimate was within 1 hour of the reported bedtime; 128 cases (80.5%) were within 2 hours. There was equivalent accuracy in hold-out sets versus training sets of k-fold cross-validation, arguing against overfitting. The model was more accurate than a random forest approach. CONCLUSIONS: We uncovered a simple, reproducible relationship between Reddit users’ reported bedtimes and the time of day when high daytime posting rates transition to low nighttime posting rates. We captured this relationship in a model that estimates users’ bedtimes from the time stamps of their posts. Limitations include applicability only to users who post frequently, the requirement for time zone data, and limits on generalizability. Nonetheless, it is a step forward for inferring the sleep parameters of social media users passively at scale. Our model and precomputed estimated bedtimes of 50,000 Reddit users are freely available. |
format | Online Article Text |
id | pubmed-9890352 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-98903522023-02-02 Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys Meyerson, William U Fineberg, Sarah K Song, Ye Kyung Faber, Adam Ash, Garrett Andrade, Fernanda C Corlett, Philip Gerstein, Mark B Hoyle, Rick H JMIR Form Res Original Paper BACKGROUND: Individuals with later bedtimes have an increased risk of difficulties with mood and substances. To investigate the causes and consequences of late bedtimes and other sleep patterns, researchers are exploring social media as a data source. Pioneering studies inferred sleep patterns directly from social media data. While innovative, these efforts are variously unscalable, context dependent, confined to specific sleep parameters, or rest on untested assumptions, and none of the reviewed studies apply to the popular Reddit platform or release software to the research community. OBJECTIVE: This study builds on this prior work. We estimate the bedtimes of Reddit users from the times tamps of their posts, test inference validity against survey data, and release our model as an R package (The R Foundation). METHODS: We included 159 sufficiently active Reddit users with known time zones and known, nonanomalous bedtimes, together with the time stamps of their 2.1 million posts. The model’s form was chosen by visualizing the aggregate distribution of the timing of users’ posts relative to their reported bedtimes. The chosen model represents a user’s frequency of Reddit posting by time of day, with a flat portion before bedtime and a quadratic depletion that begins near the user’s bedtime, with parameters fitted to the data. This model estimates the bedtimes of individual Reddit users from the time stamps of their posts. Model performance is assessed through k-fold cross-validation. We then apply the model to estimate the bedtimes of 51,372 sufficiently active, nonbot Reddit users with known time zones from the time stamps of their 140 million posts. RESULTS: The Pearson correlation between expected and observed Reddit posting frequencies in our model was 0.997 on aggregate data. On average, posting starts declining 45 minutes before bedtime, reaches a nadir 4.75 hours after bedtime that is 87% lower than the daytime rate, and returns to baseline 10.25 hours after bedtime. The Pearson correlation between inferred and reported bedtimes for individual users was 0.61 (P<.001). In 90 of 159 cases (56.6%), our estimate was within 1 hour of the reported bedtime; 128 cases (80.5%) were within 2 hours. There was equivalent accuracy in hold-out sets versus training sets of k-fold cross-validation, arguing against overfitting. The model was more accurate than a random forest approach. CONCLUSIONS: We uncovered a simple, reproducible relationship between Reddit users’ reported bedtimes and the time of day when high daytime posting rates transition to low nighttime posting rates. We captured this relationship in a model that estimates users’ bedtimes from the time stamps of their posts. Limitations include applicability only to users who post frequently, the requirement for time zone data, and limits on generalizability. Nonetheless, it is a step forward for inferring the sleep parameters of social media users passively at scale. Our model and precomputed estimated bedtimes of 50,000 Reddit users are freely available. JMIR Publications 2023-01-17 /pmc/articles/PMC9890352/ /pubmed/36649054 http://dx.doi.org/10.2196/38112 Text en ©William U Meyerson, Sarah K Fineberg, Ye Kyung Song, Adam Faber, Garrett Ash, Fernanda C Andrade, Philip Corlett, Mark B Gerstein, Rick H Hoyle. Originally published in JMIR Formative Research (https://formative.jmir.org), 17.01.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Meyerson, William U Fineberg, Sarah K Song, Ye Kyung Faber, Adam Ash, Garrett Andrade, Fernanda C Corlett, Philip Gerstein, Mark B Hoyle, Rick H Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys |
title | Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys |
title_full | Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys |
title_fullStr | Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys |
title_full_unstemmed | Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys |
title_short | Estimation of Bedtimes of Reddit Users: Integrated Analysis of Time Stamps and Surveys |
title_sort | estimation of bedtimes of reddit users: integrated analysis of time stamps and surveys |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9890352/ https://www.ncbi.nlm.nih.gov/pubmed/36649054 http://dx.doi.org/10.2196/38112 |
work_keys_str_mv | AT meyersonwilliamu estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT finebergsarahk estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT songyekyung estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT faberadam estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT ashgarrett estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT andradefernandac estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT corlettphilip estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT gersteinmarkb estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys AT hoylerickh estimationofbedtimesofredditusersintegratedanalysisoftimestampsandsurveys |