Database Credentialed Access

# GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization

Xu, X., Zhang, H., Sefidgar, Y., Ren, Y., Liu, X., Seo, W., Brown, J., Kuehn, K., Merrill, M., Nurius, P., Patel, S., Althoff, T., Morris, M., Riskin, E., Mankoff, J., & Dey, A. (2022). GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization (version 1.0). PhysioNet. https://doi.org/10.13026/jvtb-2d81.

Xu X., Zhang H., Sefidgar Y., Ren Y., Liu X., Seo W., Brown J., Kuehn K., Merrill M., Nurius P., Patel S., Althoff T., Morris M., Riskin E., Mankoff J., Dey A. (2022) GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization. 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

## Abstract

We present the first multi-year mobile sensing datasets. Our multi-year data collection studies span four years (10 weeks each year, from 2018 to 2021). The four datasets contain data collected from 705 person-years (497 unique participants) with diverse racial, ability, and immigrant backgrounds. Each year, participants would install a mobile app on their phones and wear a fitness tracker. The app and wearable device passively track multiple sensor streams in the background 24×7, including location, phone usage, calls, Bluetooth, physical activity, and sleep behavior. In addition, participants completed weekly short surveys and two comprehensive surveys on health behaviors and symptoms, social well-being, emotional states, mental health, and other metrics. Our dataset analysis indicates that our datasets capture a wide range of daily human routines, and reveal insights between daily behaviors and important well-being metrics (e.g., depression status). We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.

## Background

Among various longitudinal sensor streams, smartphones and wearables are arguably one of the most widely available data sources [1]. The advances in mobile technology provide an unprecedented opportunity to capture multiple aspects of daily human behaviors, by collecting continuous sensor streams from these devices [2,3], together with metrics about health and well-being through self-report or clinical diagnosis as modeling targets. It poses unique challenges compared to traditional time-series classification tasks. First, the data covers a much longer time period, usually across multiple months or years. Second, the nature of longitudinal collection often results in a high data missing rate. Third, the prediction target label is sparse, especially for mental well-being metrics.

Longitudinal human behavior modeling is an important multidisciplinary area spanning machine learning, psychology, human-computer interaction, and ubiquitous computing. Researchers have demonstrated the potential of using longitudinal mobile sensing data for behavior modeling in many applications, e.g., detecting physical health issues [4], monitoring mental health status [3], measuring job performance [5], and tracing education outcomes [6]. Most existing research employed off-the-shelf ML algorithms and evaluated them on their private datasets. However, testing a model with new contexts and users is imperative to ensure its practical deployability. To the best of our knowledge, there has been no investigation of the cross-dataset generalizability of these longitudinal behavior models, nor an open testbed to evaluate and compare various modeling algorithms. To address this gap, we present the first multi-year passive mobile sensing datasets to help the ML community explore generalizable longitudinal behavior models.

## Methods

Our data collection studies were conducted at a Carnegie-classified R-1 university in the United State with an IRB review and approval. We recruited undergraduates via emails from 2018 to 2021. After the first year, previous-year participants were invited to join again. The study was conducted during Spring quarter for 10 weeks each year, so the impact of seasonal effects was controlled. Based on their compliance, participants received up to \$245 in compensation every quarter.

The four datasets (DS1 to DS4) have 155, 218, 137, and 195 participants (705 person-years overall, and 497 unique people). Our datasets have a high representation of females (58.9%), immigrants (24.2%), first-generations (38.2%), and disability (9.1%), and have a wide coverage of races, with Asian (53.9%) and White (31.9%) being dominant (e.g., Hispanic/Latino 7.4%, Black/African American 3.3%).

Part 1: Survey Data

We collected survey data at multiple stages of the study. We delivered extensive surveys before the start and at the end of the study (pre/post surveys) and short weekly Ecological Momentary Assessment (EMA) surveys during the study to collect in-the-moment self-report data. All surveys consist of well-established and validated questionnaires to ensure data quality.

Our pre/post surveys include a number of questionnaires to cover various aspects of life, including 1) personality (BFI-10, The Big-Five Inventory-10), 2) physical health (CHIPS, Cohen-Hoberman Inventory of Physical Symptoms), 3) mental well-being (e.g., BDI-II, Beck Depression Inventory-II; ERQ, Emotion Regulation Questionnaire), and 4) social well-being (e.g., Sense of Social and Academic Fit Scale; EDS, Everyday Discrimination Scale). Our EMA surveys focus on capturing participants’ recent sense of their mental health, including PHQ-4, Patient Health Questionnaire 4; PSS-4, Perceived Stress Scale 4; and PANAS, Positive and Negative Affect Schedule.

We use the depression detection task as a starting point for behavior modeling. We employ BDI-II (post) and PHQ-4 (EMA) as the ground truth. Both are screening tools for further inquiry of clinical depression diagnosis. We focus on a binary classification problem to distinguish whether participants’ scores indicate at least mild depressive symptoms through the scales (i.e., PHQ-4 > 2, BDI-II > 13). The average number of depression labels is 11.6 ± 2.6 per person. The percentage of participants with at least mild depression is 39.8 ± 2.7% for BDI-II and 46.2 ± 2.5% for PHQ-4.

Due to some design iteration, we did not include PHQ-4 in DS1, but only PANAS. Although PANAS contains questions related to depressive symptoms (e.g., “distressed”), it does not have a comparable theoretical foundation for depression detection like PHQ-4 or BDI-II. Therefore, to maximize the compatibility of the datasets, we trained a small ML model on DS2 that has both PANAS and PHQ-4 scores to generate reliable ground truth labels. Specifically, we used a decision tree (depth=2) to take PNANS scores on two affect questions (“depressed” and “nervous”) as the input and predict PHQ-4 score-based depression binary label. Our model achieved 74.5% and 76.3% for accuracy and F1-score on a 5-fold cross-validation on DS2. The rule from the decision tree is simple: the user would be labeled as having no depression when the distress score is less than 2, and the nervous score is less than 3 (on a 1-5 Likert Scale). We then applied this rule to DS1 to generate depression labels.

Part 2: Sensor Data

We developed a mobile app using the AWARE Framework [7] that continuously collects location, phone usage (screen status), Bluetooth scans, and call logs. The app is compatible with both the iOS and Android platforms. Participants installed the app on smartphones and left it running in the background. In addition, we provided wearable Fitbits to collect their physical activities and sleep behaviors. The mobile app and wearable passively collected sensor data 24×7 during the study. The average number of days per person per year is 77.5 ± 8.9 among the four datasets.

We strictly follow our IRB's rules for anonymizing participants' data. Specifically, we employed a PID as the only indicator of a participant. No personal information is included in the dataset. Since some sensitive sensor data (e.g., location) can disclose identities, we only release feature-level data under credentialing to protect against privacy leakage.

Moreover, the data collection dates are randomly shifted by weeks. Therefore, the temporal order of events within the same subject and the day of the week are maintained after date-shifting.

## Data Description

We release four datasets, named INS-W_1, INS-W_2, INS-W_3, and INS-W_4. A dataset has three folders. We provided an overview description below. Please refer to our GLOBEM home page [8] GitHub README page [9] for more details.

• SurveyData: a list of files containing participants' survey responses, including pre/post long surveys and weekly short EMA surveys.
• FeatureData: behavior feature vectors from all data types, using RAPIDS [10] as the feature extraction tool.
• ParticipantInfoData: some additional information about participants, e.g., device platform (iOS or Android).

Specifically, the folder structure of a dataset folder is shown as follows:

• SurveyData
• dep_weekly.csv
• dep_endterm.csv
• pre.csv
• post.csv
• ema.csv
• FeatureData
• rapids.csv
• location.csv
• screen.csv
• call.csv
• bluetooth.csv
• steps.csv
• sleep.csv
• wifi.csv
• ParticipantsInfoData
• platform.csv

### Survey Data

The SurveyData folder contains five files, all indexed by pid and date:

• dep_weekly.csv: The specific file for depression labels (column "dep") combining post and EMA surveys.
• dep_endterm.csv: The specific file for depression labels (column "dep") only in post surveys. Some prior depression detection tasks focus on end-of-term depression prediction.
These two files are created for depression as it is the benchmark task. We envision future work can be extended to other modeling targets as well.
• pre.csv: The file contains all questionnaires that participants filled in right before the start of the data collection study (thus pre-study).
• post.csv: The file contains all questionnaires that participants filled in right after the end of the data collection study (thus post-study).
• ema.csv: The file contains all EMA surveys that participants filled in during the study. Some EMAs were delivered on Wednesdays, while some were delivered on Sundays.

#### Survey List

 Survey Name Short Description Score Range Dataset Category UCLA Short-form UCLA Loneliness Scale A 10-item scale measuring one's subjective feelings of loneliness as well as social isolation. Items 2, 6, 10, 11, 13, 14, 16, 18, 19, and 20 of the original scale are included in the short form. Higher values indicate more subjective loneliness. 10 - 40 1,2,3,4 pre, post SocialFit Sense of Social and Academic Fit Scale A 17-item scale measuring the sense of social and academic fit of students at the institution where this study was conducted. Higher values indicate higher feelings of belongings. 17 - 119 1,2,3,4 pre, post 2-Way SSS 2-Way Social Support Scale A 21-item scale measuring social supports from four aspects (a) giving emotional support, (b) giving instrumental support, (c) receiving emotional support, and (d) receiving instrumental support. Higher values indicate more social support. (a) 0 - 25 (b) 0 - 25 (c) 0 - 35 (d) 0 - 20 1,2,3,4 pre, post PSS Perceived Stress Scale A 14-item scale used to assess stress levels during the last month. Note that Year 1 used the 10-item version. Higher values indicate more perceived stress. 0 - 56 (Year 2,3,4) 0 - 40 (Year 1) 1,2,3,4 pre, post ERQ Emotion Regulation Questionnaire A 10-item scale assessing individual differences in the habitual use of two emotion regulation strategies: (a) cognitive reappraisal and (b) expressive suppression. Higher scores indicate more habitual use of reappraisal/suppression. (a) 1 - 7 (b) 1 - 7 1,2,3,4 pre, post BRS Brief Resilience Scale A 6-item scale assessing the ability to bounce back or recover from stress. Higher scores indicate more resilient from stress. 1 - 5 1,2,3,4 pre, post CHIPS Cohen-Hoberman Inventoryof Physical Symptoms A 33-item scale measuring the perceived burden from physical symptoms, and resulting psychological effect during the past 2 weeks. Higher values indicate more perceived burden from physical symptoms. 0 - 132 1,2,3,4 pre, post STAI State-Trait Anxiety Inventory for Adults A 20-item scale measuring State-Trait anxiety. Year 1 used the State version, while other years used the Trait version. Higher values indicate higher anxiety. 20 - 80 1,2,3,4 pre, post CES-D Center for EpidemiologicStudies Depression ScaleCole version A 10-item scale measuring current level of depressive symptomatology, with emphasis on the affective component, depressed mood. Year 2 used the 9-item version. Higher scores indicate more depressive symptoms. 0 - 30 (Year 1,3,4) 0 - 27 (Year 2) 1,2,3,4 pre, post BDI2 Beck Depression Inventory-II A 21-item detect depressive symptoms. Higher values indicate more depressive symptoms. 0-13: minimal to none, 14-19: mild, 20-28: moderate and 26-63: severe. 0 - 63 1,2,3,4 pre, post MAAS Mindful Attention Awareness Scale A 15-item scale assessing a core characteristic of mindfulness. Year 1 used a 7-item version, while other years used the full version. Higher values indicate higher mindfulness. 1 - 6 1,2,3,4 pre, post BFI10 The Big-Five Inventory-10 A 10-item scale measuring the Big Five personality traits Extroversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness. The higher the score, the greater the tendency of the corresponding personality. 1 - 5 1,2,3,4 pre Brief-COPE Brief Coping Orientation to Problems Experienced A 28-item scale measuring (a) adaptive and (b) maladaptive ways to cope with a stressful life event. Higher values indicate more effective/ineffective ways to cope with a stressful life event. (a): 0 - 3 (b): 0 - 3 2,3,4 pre, post GQ Gratitude Questionnaire A 6-item scale assessing individual differences in the proneness to experience gratitude in daily life. Higher scores indicate a greater tendency to experience gratitude. 6 - 42 2,3,4 pre, post FSPWB Flourishing Scale Psychological Well-Being Scale An 8-item scale measuring the psychological well-being. Higher scores indicate a person with more psychological resources and mental strengths''. 8 - 56 2,3,4 pre, post EDS Everyday DiscriminationScale A 9-item scale assessing everyday discrimination. Higher values indicate more frequent experience of discrimination. 0 - 45 2,3,4 pre, post CEDH Chronic Work Discriminationand Harassment A 12-item scale assessing experiences of discrimination in educational settings. Higher values indicate more frequent experience of discrimination in the work environment. 0 - 60 2,3,4 pre, post B-YAACQ The Brief Young Adult Alcohol ConsequencesQuestionnaire (optional) A 24-item scale measuring the alcohol problem severity continuum in college students. Higher values indicates more severe alcohol problems. 0 - 24 2,3,4 pre, post PHQ-4 Patient Health Questionnaire 4 A 4-item scale assessing (a) mental health, (b) anxiety, and (c) depression. Higher values indicate higher risk of mental health, anxiety, and depression. (a): 0 - 12 (b): 0 - 6 (c): 0 - 6 2,3,4 Weekly EMA PSS-4 Perceived Stress Scale 4 A 4-item scale assessing stress levels during the last month. Higher values indicates more perceived stress. 0 - 16 2,3,4 Weekly EMA PANAS Positive and Negative Affect Schedule A 10-item scale measuring the level of (a) positive and (b) negative affects. Higher values indicates larger extent. (a): 0 - 20 (b): 0 - 20 2,3,4 Weekly EMA

PS: Due to the design iteration, some questionnaires are not available in all studies. Moreover, some questionnaires have different versions across years. We clarify them using column names. For example, INS-W_2 only has "CESD_9items_POST", while others have "CESD_10items_POST". "CESD_9items_POST" is also calculated in other datasets to make the modeling target comparable across datasets.

### Feature Data

The FeatureData folder contains seven files, all indexed by pid and date.

• rapids.csv: The complete feature file that contains all features.
• location.csv: The feature file that contains all location features.
• screen.csv: The feature file that contains all phone usage features.
• call.csv: The feature file that contains all call features.
• bluetooth.csv: The feature file that contains all Bluetooth features.
• steps.csv: The feature file that contains all physical activity features.
• sleep.csv: The feature file that contains all sleep features.
• wifi.csv: The feature file that contains all WiFi features. Note that this feature type is not used by any existing algorithms and often has a high data missing rate.

Please note that all features are extracted with multiple time_segments

• morning (6 am - 12 pm, calculated daily)
• afternoon (12 pm - 6 pm, calculated daily)
• evening (6 pm - 12 am, calculated daily)
• night (12 am - 6 am, calculated daily)
• allday (24 hrs from 12 am to 11:59 pm, calculated daily)
• 7-day history (calculated daily)
• 14-day history (calculated daily)
• weekdays (calculated once per week on Friday)
• weekend (calculated once per week on Sunday)

For all features with numeric values, we also provide two more versions:

• normalized: subtracted by each participant's median and divided by the 5-95 quantile range
• discretized: low/medium/high split by 33/66 quantile of each participant's feature value

#### Naming Format

All features follow a consistent naming format:

[feature_type]:[feature_name][version]:[time_segment]

• feature_type: It corresponds to the six data types.
• location - f_loc
• screen - f_screen
• call - f_call
• bluetooth - f_blue
• steps - f_steps
• sleep - f_slp.
• feature_name: The name of the feature provided by RAPIDS, i.e., the second column of the following figure, plus some additional information. A typical format is [SensorType]_[CodeProvider]_[featurename]. Please refer to RAPIDS's naming format [9] for more details.
• version: It has three versions:
• 1) nothing, just empty "";
• 2) normalized, _norm;
• 3) discretized, _dis.
• time_segment: It corresponds to the specific time segment.
• morning - morning
• afternoon - afternoon
• evening - evening
• night - night
• allday - allday
• 7-day history - 7dhist
• 14-day history - 14dhist
• weekday - weekday
• weekend - weekend

A participant's "sumdurationunlock" normalized feature in mornings is "f_loc:phone_screen_rapids_sumdurationunlock_norm:morning".

#### Location Details

 Feature Name Unit Description hometime minutes Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am, including any pauses within a 200-meter radius. disttravelled meters Total distance traveled over a day (flights). rog meters The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day, and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place. maxdiam meters The maximum diameter is the largest distance between any two pauses. maxhomedist meters The maximum distance from home in meters. siglocsvisited locations The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another. avgflightlen meters Mean length of all flights. stdflightlen meters Standard deviation of the length of all flights. avgflightdur seconds Mean duration of all flights. stdflightdur seconds The standard deviation of the duration of all flights. probpause - The fraction of a day spent in a pause (as opposed to a flight). siglocentropy nats Shannon’s entropy measurement is based on the proportion of time spent at each significant location visited during a day. circdnrtn - A continuous metric quantifying a person’s circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day. wkenddayrtn - Same as circdnrtn but computed separately for weekends and weekdays. locationvariance meters2 The sum of the variances of the latitude and longitude columns. loglocationvariance - Log of the sum of the variances of the latitude and longitude columns. totaldistance meters Total distance traveled in a time segment using the haversine formula. avgspeed km/hr Average speed in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment. varspeed km/hr Speed variance in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment. numberofsignificantplaces places Number of significant locations visited. It is calculated using the DBSCAN/OPTICS clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place. numberlocationtransitions transitions Number of movements between any two clusters in a time segment. radiusgyration meters Quantifies the area covered by a participant. timeattop1location minutes Time spent at the most significant location. timeattop2location minutes Time spent at the 2nd most significant location. timeattop3location minutes Time spent at the 3rd most significant location. movingtostaticratio - Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labeled as stationary if its speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine. outlierstimepercent - Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (stationary time. Only stationary samples are clustered). A higher value represents more time spent in non-significant clusters. maxlengthstayatclusters minutes Maximum time spent in a cluster (significant location). minlengthstayatclusters minutes Minimum time spent in a cluster (significant location). avglengthstayatclusters minutes Average time spent in a cluster (significant location). stdlengthstayatclusters minutes Standard deviation of time spent in a cluster (significant location). locationentropy nats Shannon Entropy computed over the row count of each cluster (significant location), it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location). normalizedlocationentropy nats Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters; it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location). timeathome minutes Time spent at home. timeat[PLACE] minutes Time spent at [PLACE], which can be living, exercise, study, greens.

#### Phone Usage Details

 Feature Name Unit Description sumduration minutes Total duration of all unlock episodes. maxduration minutes Longest duration of any unlock episode. minduration minutes Shortest duration of any unlock episode. avgduration minutes Average duration of all unlock episodes. stdduration minutes Standard deviation duration of all unlock episodes. countepisode episodes Number of all unlock episodes. firstuseafter minutes Minutes until the first unlock episode. sumduration[PLACE] minutes Total duration of all unlock episodes. [PLACE] can be living, exercise, study, greens. Same below. maxduration[PLACE] minutes Longest duration of any unlock episode. minduration[PLACE] minutes Shortest duration of any unlock episode. avgduration[PLACE] minutes Average duration of all unlock episodes. stdduration[PLACE] minutes Standard deviation duration of all unlock episodes. countepisode[PLACE] episodes Number of all unlock episodes. firstuseafter[PLACE] minutes Minutes until the first unlock episode.

#### Call Details

 Feature Name Unit Description count calls Number of calls of a particular call_type (incoming/outgoing) occurred during a particular time_segment. distinctcontacts contacts Number of distinct contacts that are associated with a particular call_type for a particular time_segment. meanduration seconds The mean duration of all calls of a particular call_type during a particular time_segment. sumduration seconds The sum of the duration of all calls of a particular call_type during a particular time_segment. minduration seconds The duration of the shortest call of a particular call_type during a particular time_segment. maxduration seconds The duration of the longest call of a particular call_type during a particular time_segment. stdduration seconds The standard deviation of the duration of all the calls of a particular call_type during a particular time_segment. modeduration seconds The mode of the duration of all the calls of a particular call_type during a particular time_segment. entropyduration nats The estimate of the Shannon entropy for the the duration of all the calls of a particular call_type during a particular time_segment. timefirstcall minutes The time in minutes between 12:00am (midnight) and the first call of call_type. timelastcall minutes The time in minutes between 12:00am (midnight) and the last call of call_type. countmostfrequentcontact calls The number of calls of a particular call_type during a particular time_segment of the most frequent contact throughout the monitored period.

#### Bluetooth Details

 Feature Name Unit Description countscans scans Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant’s phone. uniquedevices devices Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses. meanscans scans Mean of the scans of every sensed device within each time segment instance. stdscans scans Standard deviation of the scans of every sensed device within each time segment instance. countscansmostfrequentdevicewithinsegments scans Number of scans of the most sensed device within each time segment instance. countscansleastfrequentdevicewithinsegments scans Number of scans of the least sensed device within each time segment instance. countscansmostfrequentdeviceacrosssegments scans Number of scans of the most sensed device across time segment instances of the same type. countscansleastfrequentdeviceacrosssegments scans Number of scans of the least sensed device across time segment instances of the same type per device. countscansmostfrequentdeviceacrossdataset scans Number of scans of the most sensed device across the entire dataset of every participant. countscansleastfrequentdeviceacrossdataset scans Number of scans of the least sensed device across the entire dataset of every participant.

#### WiFi Details

 Feature Name Unit Description countscans devices Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately. uniquedevices devices Number of unique access point during a time_segment as identified by their hardware address. countscansmostuniquedevice scans Number of scans of the most scanned access point during a time_segment across the whole monitoring period.

#### Physical Activity Details

 Feature Name Unit Description maxsumsteps steps The maximum daily step count during a time segment. minsumsteps steps The minimum daily step count during a time segment. avgsumsteps steps The average daily step count during a time segment. mediansumsteps steps The median of daily step count during a time segment. stdsumsteps steps The standard deviation of daily step count during a time segment. sumsteps steps The total step count during a time segment. maxsteps steps The maximum step count during a time segment. minsteps steps The minimum step count during a time segment. avgsteps steps The average step count during a time segment. stdsteps steps The standard deviation of step count during a time segment. countepisodesedentarybout bouts Number of sedentary bouts during a time segment. sumdurationsedentarybout minutes Total duration of all sedentary bouts during a time segment. maxdurationsedentarybout minutes The maximum duration of any sedentary bout during a time segment. mindurationsedentarybout minutes The minimum duration of any sedentary bout during a time segment. avgdurationsedentarybout minutes The average duration of sedentary bouts during a time segment. stddurationsedentarybout minutes The standard deviation of the duration of sedentary bouts during a time segment. countepisodeactivebout bouts Number of active bouts during a time segment. sumdurationactivebout minutes Total duration of all active bouts during a time segment. maxdurationactivebout minutes The maximum duration of any active bout during a time segment. mindurationactivebout minutes The minimum duration of any active bout during a time segment. avgdurationactivebout minutes The average duration of active bouts during a time segment. stddurationactivebout minutes The standard deviation of the duration of active bouts during a time segment.

#### Sleep Details

 Feature Name Unit Description countepisode[LEVEL][TYPE] episodes Number of [LEVEL][TYPE] sleep episodes. [LEVEL] is one of awake and asleep and [TYPE] is one of main, nap, and all. Same below. sumduration[LEVEL][TYPE] minutes Total duration of all [LEVEL][TYPE] sleep episodes. maxduration[LEVEL][TYPE] minutes Longest duration of any [LEVEL][TYPE] sleep episode. minduration[LEVEL][TYPE] minutes Shortest duration of any [LEVEL][TYPE] sleep episode. avgduration[LEVEL][TYPE] minutes Average duration of all [LEVEL][TYPE] sleep episodes. medianduration[LEVEL][TYPE] minutes Median duration of all [LEVEL][TYPE] sleep episodes. stdduration[LEVEL][TYPE] minutes Standard deviation duration of all [LEVEL][TYPE] sleep episodes. firstwaketimeTYPE minutes First wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode’s end time. lastwaketimeTYPE minutes Last wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode’s end time. firstbedtimeTYPE minutes First bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode’s start time. lastbedtimeTYPE minutes Last bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode’s start time. countepisodeTYPE episodes Number of sleep episodes for a certain sleep type during a time segment. avgefficiencyTYPE scores Average sleep efficiency for a certain sleep type during a time segment. sumdurationafterwakeupTYPE minutes Total duration the user stayed in bed after waking up for a certain sleep type during a time segment. sumdurationasleepTYPE minutes Total sleep duration for a certain sleep type during a time segment. sumdurationawakeTYPE minutes Total duration the user stayed awake but still in bed for a certain sleep type during a time segment. sumdurationtofallasleepTYPE minutes Total duration the user spent to fall asleep for a certain sleep type during a time segment. sumdurationinbedTYPE minutes Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. avgdurationafterwakeupTYPE minutes Average duration the user stayed in bed after waking up for a certain sleep type during a time segment. avgdurationasleepTYPE minutes Average sleep duration for a certain sleep type during a time segment. avgdurationawakeTYPE minutes Average duration the user stayed awake but still in bed for a certain sleep type during a time segment. avgdurationtofallasleepTYPE minutes Average duration the user spent to fall asleep for a certain sleep type during a time segment. avgdurationinbedTYPE minutes Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.

### Participant Info Data

The ParticipantInfoData folder contains files with additional information.

• platform.csv: The file contains each participant's major smartphone platform (iOS or Android), indexed by pid
• demographics.csv: Due to privacy concerns, demographic data are only available for special requests. Please reach out to us directly with a clear research plan with demographic data.

## Usage Notes

We provide a behavior modeling benchmark platform GLOBEM [8,9]. The platform is designed to support researchers in using, developing, and evaluating different longitudinal behavior modeling methods.

Researchers who use the datasets must agree to the following terms.

Privacy
Although the database has been anonymized, we cannot eliminate all potential risks of privacy information leakage. The PI of any research group access to the dataset, is responsible for continuing to safeguard this database, taking whatever steps are appropriate to protect participants’ privacy and data confidentiality. The specific actions required to safeguard the data may change over time.

Misuse
If at any point, the administrators of the datasets at the University of Washington have concerns or reasonable suspicions that the researcher has violated these usage note, the researcher will be notified. Concerns about misuse may be shared with PhysioNet and other related entities.

Our datasets have led to multiple publications. Please find them in the reference list [11-18].

## Release Notes

v1.0 - Release of our GLOBEM Dataset

## Ethics

Our datasets aim at aiding research efforts in the area of developing, testing, and evaluating machine learning algorithms to better understand college students’ (and the potentially more general population) daily behaviors, health, and well-being from continuous sensor streams and self-reports. These findings may support public interest in how to improve student experiences and drive policy around adverse events students and others may experience.

Privacy is the major ethical concern of our data collection studies. Our study has obtained IRB approval from the University of Washington with the IRB number STUDY00003244. Participants signed the consent form before joining our study. We strictly follow the IRB rules to anonymize participants' data. Anyone outside our core data collection group cannot access direct individually-identifiable information. We also eliminated the data for users who stopped their participation at any time during the study. Since some sensitive sensor data (e.g., location) can disclose identities, we only release feature-level data under credentialing to protect against privacy leakage.

## Acknowledgements

Our multi-year data collection study closely followed a sister study at Carnegie Mellon University (CMU). We acknowledge all efforts from CMU Study Team to provide important starting and reference materials. Moreover, our studies were greatly inspired by StudentLife researchers from Dartmouth College.

Our studies were supported by the University of Washington (including the Paul G. Allen School of Computer Science and Engineering; Department of Electrical and Computer Engineering; Population Health; Addictions, Drug and Alcohol Institute; and the Center for Research and Education on Accessible Technology and Experiences); the National Science Foundation (EDA-2009977, CHS-2016365, CHS-1941537, IIS1816687 and IIS7974751), the National Institute on Disability, Independent Living and Rehabilitation Research (90DPGE0003-01), Samsung Research America, and Google.

## Conflicts of Interest

The authors have no conflicts of interest to declare.

## References

1. N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. Campbell. A survey of mobile phone sensing. IEEE Communications Magazine, 48(9), 2010.
2. M. E. Morris, Q. Kathawala, T. K. Leen, E. E. Gorenstein, F. Guilak, W. DeLeeuw, and M. Labhard. Mobile therapy: case study evaluations of a cell phone application for emotional self-awareness. Journal of medical Internet research, 12(2):e10, 2010.
3. R. Wang, F. Chen, Z. Chen, T. Li, G. Harari, S. Tignor, X. Zhou, D. Ben-Zeev, and A. T. Campbell. Studentlife: Assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 3–14. ACM, 2014.
4. J.-K. Min, A. Doryab, J. Wiese, S. Amini, J. Zimmerman, and J. I. Hong. Toss “n” turn: Smartphone as sleep and sleep quality detector. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, page 477–486, New York, NY, USA, 2014. Association for Computing Machinery.
5. S. M. Mattingly, J. M. Gregg, P. Audia, A. E. Bayraktaroglu, A. T. Campbell, N. V. Chawla, V. Das Swain, M. De Choudhury, S. K. D’Mello, A. K. Dey, et al. The tesserae project: Large-scale, longitudinal, in-situ, multimodal sensing of information workers. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–8, 2019.
6. R. Wang, G. Harari, P. Hao, X. Zhou, and A. T. Campbell. Smartgpa: how smartphones can assess and predict academic performance of college students. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, pages 295–306, 2015.
7. D. Ferreira, V. Kostakos, and A. K. Dey. Aware: Mobile context instrumentation framework. Frontiers in ICT, 2:6, 2015.
9. Benchmark Platform GLOBEM. https://github.com/UW-EXP/GLOBEM/
10. Rapids documentation. https://www.rapids.science/1.6/
11. M. E. Morris, K. S. Kuehn, J. Brown, P. S. Nurius, H. Zhang, Y. S. Sefidgar, X. Xu, E. A. Riskin, A. K. Dey, S. Consolvo, and J. C. Mankoff. College from home during COVID-19: A mixed-methods study of heterogeneous experiences. PLOS ONE, 16(6):e0251580, June 2021.
12. P. S. Nurius, Y. S. Sefidgar, K. S. Kuehn, J. Jung, H. Zhang, O. Figueira, E. A. Riskin, A. K. Dey, and J. C. Mankoff. Distress among undergraduates: Marginality, stressors and resilience resources. Journal of American College Health, pages 1–9, July 2021.
13. Y. S. Sefidgar, W. Seo, K. S. Kuehn, T. Althoff, A. Browning, E. Riskin, P. S. Nurius, A. K. Dey, and J. Mankoff. Passively-sensed Behavioral Correlates of Discrimination Events in College Students. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–29, Nov. 2019.
14. X. Xu, P. Chikersal, A. Doryab, D. K. Villalba, J. M. Dutcher, M. J. Tumminia, T. Althoff, S. Cohen, K. G. Creswell, J. D. Creswell, J. Mankoff, and A. K. Dey. Leveraging Routine Behavior and Contextually- Filtered Features for Depression Detection among College Students. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(3):1–33, Sept. 2019.
15. X. Xu, P. Chikersal, J. M. Dutcher, Y. S. Sefidgar, W. Seo, M. J. Tumminia, D. K. Villalba, S. Cohen, K. G. Creswell, J. D. Creswell, A. Doryab, P. S. Nurius, E. Riskin, A. K. Dey, and J. Mankoff. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(1):1–27, Mar. 2021.
16. X. Xu, J. Mankoff, and A. K. Dey. Understanding practices and needs of researchers in human state modeling by passive mobile sensing. CCF Transactions on Pervasive Computing and Interaction, July 2021.
17. H. Zhang, M. E. Morris, P. S. Nurius, K. Mack, J. Brown, K. S. Kuehn, Y. S. Sefidgar, X. Xu, E. A. Riskin, A. K. Dey, and J. Mankoff. Impact of Online Learning in the Context of COVID-19 on Undergraduates with Disabilities and Mental Health Concerns. ACM Transactions on Accessible Computing, page 3538514, July 2022.
18. H. Zhang, P. Nurius, Y. Sefidgar, M. Morris, S. Balasubramanian, J. Brown, A. K. Dey, K. Kuehn, E. Riskin, X. Xu, and J. Mankoff. How Does COVID-19 impact Students with Disabilities/Health Concerns? In arXiv. arXiv, May 2020. arXiv:2005.05438. [cs]

##### Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

##### Corresponding Author
You must be logged in to view the contact information.
##### Versions
• 1.0 - Nov. 4, 2022
• 1.1 - March 14, 2023