Name: GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization
Published: March 14, 2023
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Xuhai Xu , Han Zhang , Yasaman Sefidgar , Yiyi Ren , Xin Liu , Woosuk Seo , Jennifer Brown , Kevin Kuehn , Mike Merrill , Paula Nurius , Shwetak Patel , Tim Althoff , Margaret Morris , Eve Riskin , Jennifer Mankoff , Anind Dey

Published: March 14, 2023. Version: 1.1

When using this resource, please cite: (show more options)
Xu, X., Zhang, H., Sefidgar, Y., Ren, Y., Liu, X., Seo, W., Brown, J., Kuehn, K., Merrill, M., Nurius, P., Patel, S., Althoff, T., Morris, M., Riskin, E., Mankoff, J., & Dey, A. (2023). GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization (version 1.1). PhysioNet. https://doi.org/10.13026/r9s1-s711.

MLA	Xu, Xuhai, et al. "GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization" (version 1.1). PhysioNet (2023), https://doi.org/10.13026/r9s1-s711.
APA	Xu, X., Zhang, H., Sefidgar, Y., Ren, Y., Liu, X., Seo, W., Brown, J., Kuehn, K., Merrill, M., Nurius, P., Patel, S., Althoff, T., Morris, M., Riskin, E., Mankoff, J., & Dey, A. (2023). GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization (version 1.1). PhysioNet. https://doi.org/10.13026/r9s1-s711.
Chicago	Xu, Xuhai, Zhang, Han, Sefidgar, Yasaman, Ren, Yiyi, Liu, Xin, Seo, Woosuk, Brown, Jennifer, Kuehn, Kevin, Merrill, Mike, Nurius, Paula, Patel, Shwetak, Althoff, Tim, Morris, Margaret, Riskin, Eve, Mankoff, Jennifer, and Anind Dey. "GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization" (version 1.1). PhysioNet (2023). https://doi.org/10.13026/r9s1-s711.
Harvard	Xu, X., Zhang, H., Sefidgar, Y., Ren, Y., Liu, X., Seo, W., Brown, J., Kuehn, K., Merrill, M., Nurius, P., Patel, S., Althoff, T., Morris, M., Riskin, E., Mankoff, J., and Dey, A. (2023) 'GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization' (version 1.1), PhysioNet. Available at: https://doi.org/10.13026/r9s1-s711.
Vancouver	Xu X, Zhang H, Sefidgar Y, Ren Y, Liu X, Seo W, Brown J, Kuehn K, Merrill M, Nurius P, Patel S, Althoff T, Morris M, Riskin E, Mankoff J, Dey A. GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization (version 1.1). PhysioNet. 2023. Available from: https://doi.org/10.13026/r9s1-s711.

Additionally, please cite the original publication:

Xu X., Zhang H., Sefidgar Y., Ren Y., Liu X., Seo W., Brown J., Kuehn K., Merrill M., Nurius P., Patel S., Althoff T., Morris M., Riskin E., Mankoff J., Dey A. (2022) GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization. 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

We present the first multi-year mobile sensing datasets. Our multi-year data collection studies span four years (10 weeks each year, from 2018 to 2021). The four datasets contain data collected from 705 person-years (497 unique participants) with diverse racial, ability, and immigrant backgrounds. Each year, participants would install a mobile app on their phones and wear a fitness tracker. The app and wearable device passively track multiple sensor streams in the background 24×7, including location, phone usage, calls, Bluetooth, physical activity, and sleep behavior. In addition, participants completed weekly short surveys and two comprehensive surveys on health behaviors and symptoms, social well-being, emotional states, mental health, and other metrics. Our dataset analysis indicates that our datasets capture a wide range of daily human routines, and reveal insights between daily behaviors and important well-being metrics (e.g., depression status). We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.

Background

Among various longitudinal sensor streams, smartphones and wearables are arguably one of the most widely available data sources [1]. The advances in mobile technology provide an unprecedented opportunity to capture multiple aspects of daily human behaviors, by collecting continuous sensor streams from these devices [2,3], together with metrics about health and well-being through self-report or clinical diagnosis as modeling targets. It poses unique challenges compared to traditional time-series classification tasks. First, the data covers a much longer time period, usually across multiple months or years. Second, the nature of longitudinal collection often results in a high data missing rate. Third, the prediction target label is sparse, especially for mental well-being metrics.

Longitudinal human behavior modeling is an important multidisciplinary area spanning machine learning, psychology, human-computer interaction, and ubiquitous computing. Researchers have demonstrated the potential of using longitudinal mobile sensing data for behavior modeling in many applications, e.g., detecting physical health issues [4], monitoring mental health status [3], measuring job performance [5], and tracing education outcomes [6]. Most existing research employed off-the-shelf ML algorithms and evaluated them on their private datasets. However, testing a model with new contexts and users is imperative to ensure its practical deployability. To the best of our knowledge, there has been no investigation of the cross-dataset generalizability of these longitudinal behavior models, nor an open testbed to evaluate and compare various modeling algorithms. To address this gap, we present the first multi-year passive mobile sensing datasets to help the ML community explore generalizable longitudinal behavior models.

Methods

Our data collection studies were conducted at a Carnegie-classified R-1 university in the United State with an IRB review and approval. We recruited undergraduates via emails from 2018 to 2021. After the first year, previous-year participants were invited to join again. The study was conducted during Spring quarter for 10 weeks each year, so the impact of seasonal effects was controlled. Based on their compliance, participants received up to $245 in compensation every quarter.

The four datasets (DS1 to DS4) have 155, 218, 137, and 195 participants (705 person-years overall, and 497 unique people). Our datasets have a high representation of females (58.9%), immigrants (24.2%), first-generations (38.2%), and disability (9.1%), and have a wide coverage of races, with Asian (53.9%) and White (31.9%) being dominant (e.g., Hispanic/Latino 7.4%, Black/African American 3.3%).

Part 1: Survey Data

We collected survey data at multiple stages of the study. We delivered extensive surveys before the start and at the end of the study (pre/post surveys) and short weekly Ecological Momentary Assessment (EMA) surveys during the study to collect in-the-moment self-report data. All surveys consist of well-established and validated questionnaires to ensure data quality.

Our pre/post surveys include a number of questionnaires to cover various aspects of life, including 1) personality (BFI-10, The Big-Five Inventory-10), 2) physical health (CHIPS, Cohen-Hoberman Inventory of Physical Symptoms), 3) mental well-being (e.g., BDI-II, Beck Depression Inventory-II; ERQ, Emotion Regulation Questionnaire), and 4) social well-being (e.g., Sense of Social and Academic Fit Scale; EDS, Everyday Discrimination Scale). Our EMA surveys focus on capturing participants’ recent sense of their mental health, including PHQ-4, Patient Health Questionnaire 4; PSS-4, Perceived Stress Scale 4; and PANAS, Positive and Negative Affect Schedule.

We use the depression detection task as a starting point for behavior modeling. We employ BDI-II (post) and PHQ-4 (EMA) as the ground truth. Both are screening tools for further inquiry of clinical depression diagnosis. We focus on a binary classification problem to distinguish whether participants’ scores indicate at least mild depressive symptoms through the scales (i.e., PHQ-4 > 2, BDI-II > 13). The average number of depression labels is 11.6 ± 2.6 per person. The percentage of participants with at least mild depression is 39.8 ± 2.7% for BDI-II and 46.2 ± 2.5% for PHQ-4.

Due to some design iteration, we did not include PHQ-4 in DS1, but only PANAS. Although PANAS contains questions related to depressive symptoms (e.g., “distressed”), it does not have a comparable theoretical foundation for depression detection like PHQ-4 or BDI-II. Therefore, to maximize the compatibility of the datasets, we trained a small ML model on DS2 that has both PANAS and PHQ-4 scores to generate reliable ground truth labels. Specifically, we used a decision tree (depth=2) to take PNANS scores on two affect questions (“depressed” and “nervous”) as the input and predict PHQ-4 score-based depression binary label. Our model achieved 74.5% and 76.3% for accuracy and F1-score on a 5-fold cross-validation on DS2. The rule from the decision tree is simple: the user would be labeled as having no depression when the distress score is less than 2, and the nervous score is less than 3 (on a 1-5 Likert Scale). We then applied this rule to DS1 to generate depression labels.

Part 2: Sensor Data

We developed a mobile app using the AWARE Framework [7] that continuously collects location, phone usage (screen status), Bluetooth scans, and call logs. The app is compatible with both the iOS and Android platforms. Participants installed the app on smartphones and left it running in the background. In addition, we provided wearable Fitbits to collect their physical activities and sleep behaviors. The mobile app and wearable passively collected sensor data 24×7 during the study. The average number of days per person per year is 77.5 ± 8.9 among the four datasets.

We strictly follow our IRB's rules for anonymizing participants' data. Specifically, we employed a PID as the only indicator of a participant. No personal information is included in the dataset. Since some sensitive sensor data (e.g., location) can disclose identities, we only release feature-level data under credentialing to protect against privacy leakage.

Moreover, the data collection dates are randomly shifted by weeks. Therefore, the temporal order of events within the same subject and the day of the week are maintained after date-shifting.

Data Description

We release four datasets, named INS-W_1, INS-W_2, INS-W_3, and INS-W_4. A dataset has three folders. We provided an overview description below. Please refer to our GLOBEM home page [8] GitHub README page [9] for more details.

SurveyData: a list of files containing participants' survey responses, including pre/post long surveys and weekly short EMA surveys.
FeatureData: behavior feature vectors from all data types, using RAPIDS [10] as the feature extraction tool.
ParticipantInfoData: some additional information about participants, e.g., device platform (iOS or Android).

Specifically, the folder structure of a dataset folder is shown as follows:

SurveyData
- dep_weekly.csv
- dep_endterm.csv
- pre.csv
- post.csv
- ema.csv
FeatureData
- rapids.csv
- location.csv
- screen.csv
- call.csv
- bluetooth.csv
- steps.csv
- sleep.csv
- wifi.csv
ParticipantsInfoData
- platform.csv

Survey Data

The SurveyData folder contains five files, all indexed by pid and date:

dep_weekly.csv: The specific file for depression labels (column "dep") combining post and EMA surveys.
dep_endterm.csv: The specific file for depression labels (column "dep") only in post surveys. Some prior depression detection tasks focus on end-of-term depression prediction.
These two files are created for depression as it is the benchmark task. We envision future work can be extended to other modeling targets as well.
pre.csv: The file contains all questionnaires that participants filled in right before the start of the data collection study (thus pre-study).
post.csv: The file contains all questionnaires that participants filled in right after the end of the data collection study (thus post-study).
ema.csv: The file contains all EMA surveys that participants filled in during the study. Some EMAs were delivered on Wednesdays, while some were delivered on Sundays.

Survey List

Survey Name	Short Description	Score Range	Dataset	Category
UCLA Short-form UCLA Loneliness Scale	A 10-item scale measuring one's subjective feelings of loneliness as well as social isolation. Items 2, 6, 10, 11, 13, 14, 16, 18, 19, and 20 of the original scale are included in the short form. Higher values indicate more subjective loneliness.	10 - 40	1,2,3,4	pre, post
SocialFit Sense of Social and Academic Fit Scale	A 17-item scale measuring the sense of social and academic fit of students at the institution where this study was conducted. Higher values indicate higher feelings of belongings.	17 - 119	1,2,3,4	pre, post
2-Way SSS 2-Way Social Support Scale	A 21-item scale measuring social supports from four aspects (a) giving emotional support, (b) giving instrumental support, (c) receiving emotional support, and (d) receiving instrumental support. Higher values indicate more social support.	(a) 0 - 25 (b) 0 - 25 (c) 0 - 35 (d) 0 - 20	1,2,3,4	pre, post
PSS Perceived Stress Scale	A 14-item scale used to assess stress levels during the last month. Note that Year 1 used the 10-item version. Higher values indicate more perceived stress.	0 - 56 (Year 2,3,4) 0 - 40 (Year 1)	1,2,3,4	pre, post
ERQ Emotion Regulation Questionnaire	A 10-item scale assessing individual differences in the habitual use of two emotion regulation strategies: (a) cognitive reappraisal and (b) expressive suppression. Higher scores indicate more habitual use of reappraisal/suppression.	(a) 1 - 7 (b) 1 - 7	1,2,3,4	pre, post
BRS Brief Resilience Scale	A 6-item scale assessing the ability to bounce back or recover from stress. Higher scores indicate more resilient from stress.	1 - 5	1,2,3,4	pre, post
CHIPS Cohen-Hoberman Inventoryof Physical Symptoms	A 33-item scale measuring the perceived burden from physical symptoms, and resulting psychological effect during the past 2 weeks. Higher values indicate more perceived burden from physical symptoms.	0 - 132	1,2,3,4	pre, post
STAI State-Trait Anxiety Inventory for Adults	A 20-item scale measuring State-Trait anxiety. Year 1 used the State version, while other years used the Trait version. Higher values indicate higher anxiety.	20 - 80	1,2,3,4	pre, post
CES-D Center for EpidemiologicStudies Depression ScaleCole version	A 10-item scale measuring current level of depressive symptomatology, with emphasis on the affective component, depressed mood. Year 2 used the 9-item version. Higher scores indicate more depressive symptoms.	0 - 30 (Year 1,3,4) 0 - 27 (Year 2)	1,2,3,4	pre, post
BDI2 Beck Depression Inventory-II	A 21-item detect depressive symptoms. Higher values indicate more depressive symptoms. 0-13: minimal to none, 14-19: mild, 20-28: moderate and 26-63: severe.	0 - 63	1,2,3,4	pre, post
MAAS Mindful Attention Awareness Scale	A 15-item scale assessing a core characteristic of mindfulness. Year 1 used a 7-item version, while other years used the full version. Higher values indicate higher mindfulness.	1 - 6	1,2,3,4	pre, post
BFI10 The Big-Five Inventory-10	A 10-item scale measuring the Big Five personality traits Extroversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness. The higher the score, the greater the tendency of the corresponding personality.	1 - 5	1,2,3,4	pre
Brief-COPE Brief Coping Orientation to Problems Experienced	A 28-item scale measuring (a) adaptive and (b) maladaptive ways to cope with a stressful life event. Higher values indicate more effective/ineffective ways to cope with a stressful life event.	(a): 0 - 3 (b): 0 - 3	2,3,4	pre, post
GQ Gratitude Questionnaire	A 6-item scale assessing individual differences in the proneness to experience gratitude in daily life. Higher scores indicate a greater tendency to experience gratitude.	6 - 42	2,3,4	pre, post
FSPWB Flourishing Scale Psychological Well-Being Scale	An 8-item scale measuring the psychological well-being. Higher scores indicate a person with ``more psychological resources and mental strengths''.	8 - 56	2,3,4	pre, post
EDS Everyday DiscriminationScale	A 9-item scale assessing everyday discrimination. Higher values indicate more frequent experience of discrimination.	0 - 45	2,3,4	pre, post
CEDH Chronic Work Discriminationand Harassment	A 12-item scale assessing experiences of discrimination in educational settings. Higher values indicate more frequent experience of discrimination in the work environment.	0 - 60	2,3,4	pre, post
B-YAACQ The Brief Young Adult Alcohol ConsequencesQuestionnaire (optional)	A 24-item scale measuring the alcohol problem severity continuum in college students. Higher values indicates more severe alcohol problems.	0 - 24	2,3,4	pre, post
PHQ-4 Patient Health Questionnaire 4	A 4-item scale assessing (a) mental health, (b) anxiety, and (c) depression. Higher values indicate higher risk of mental health, anxiety, and depression.	(a): 0 - 12 (b): 0 - 6 (c): 0 - 6	2,3,4	Weekly EMA
PSS-4 Perceived Stress Scale 4	A 4-item scale assessing stress levels during the last month. Higher values indicates more perceived stress.	0 - 16	2,3,4	Weekly EMA
PANAS Positive and Negative Affect Schedule	A 10-item scale measuring the level of (a) positive and (b) negative affects. Higher values indicates larger extent.	(a): 0 - 20 (b): 0 - 20	2,3,4	Weekly EMA

PS: Due to the design iteration, some questionnaires are not available in all studies. Moreover, some questionnaires have different versions across years. We clarify them using column names. For example, INS-W_2 only has "CESD_9items_POST", while others have "CESD_10items_POST". "CESD_9items_POST" is also calculated in other datasets to make the modeling target comparable across datasets.

Feature Data

The FeatureData folder contains seven files, all indexed by pid and date.

rapids.csv: The complete feature file that contains all features.
location.csv: The feature file that contains all location features.
screen.csv: The feature file that contains all phone usage features.
call.csv: The feature file that contains all call features.
bluetooth.csv: The feature file that contains all Bluetooth features.
steps.csv: The feature file that contains all physical activity features.
sleep.csv: The feature file that contains all sleep features.
wifi.csv: The feature file that contains all WiFi features. Note that this feature type is not used by any existing algorithms and often has a high data missing rate.

Please note that all features are extracted with multiple time_segments

morning (6 am - 12 pm, calculated daily)
afternoon (12 pm - 6 pm, calculated daily)
evening (6 pm - 12 am, calculated daily)
night (12 am - 6 am, calculated daily)
allday (24 hrs from 12 am to 11:59 pm, calculated daily)
7-day history (calculated daily)
14-day history (calculated daily)
weekdays (calculated once per week on Friday)
weekend (calculated once per week on Sunday)

For all features with numeric values, we also provide two more versions:

normalized: subtracted by each participant's median and divided by the 5-95 quantile range
discretized: low/medium/high split by 33/66 quantile of each participant's feature value

Naming Format

All features follow a consistent naming format:

[feature_type]:[feature_name][version]:[time_segment]

feature_type: It corresponds to the six data types.
- location - f_loc
- screen - f_screen
- call - f_call
- bluetooth - f_blue
- steps - f_steps
- sleep - f_slp.
feature_name: The name of the feature provided by RAPIDS, i.e., the second column of the following figure, plus some additional information. A typical format is [SensorType]_[CodeProvider]_[featurename]. Please refer to RAPIDS's naming format [9] for more details.
version: It has three versions:
- 1) nothing, just empty "";
- 2) normalized, _norm;
- 3) discretized, _dis.
time_segment: It corresponds to the specific time segment.
- morning - morning
- afternoon - afternoon
- evening - evening
- night - night
- allday - allday
- 7-day history - 7dhist
- 14-day history - 14dhist
- weekday - weekday
- weekend - weekend

A participant's "sumdurationunlock" normalized feature in mornings is "f_loc:phone_screen_rapids_sumdurationunlock_norm:morning".

Please find the following tables about feature details in our datasets.

Location Details

Feature Name	Unit	Description
hometime	minutes	Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am, including any pauses within a 200-meter radius.
disttravelled	meters	Total distance traveled over a day (flights).
rog	meters	The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day, and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place.
maxdiam	meters	The maximum diameter is the largest distance between any two pauses.
maxhomedist	meters	The maximum distance from home in meters.
siglocsvisited	locations	The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another.
avgflightlen	meters	Mean length of all flights.
stdflightlen	meters	Standard deviation of the length of all flights.
avgflightdur	seconds	Mean duration of all flights.
stdflightdur	seconds	The standard deviation of the duration of all flights.
probpause	-	The fraction of a day spent in a pause (as opposed to a flight).
siglocentropy	nats	Shannon’s entropy measurement is based on the proportion of time spent at each significant location visited during a day.
circdnrtn	-	A continuous metric quantifying a person’s circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
wkenddayrtn	-	Same as circdnrtn but computed separately for weekends and weekdays.
locationvariance	meters2	The sum of the variances of the latitude and longitude columns.
loglocationvariance	-	Log of the sum of the variances of the latitude and longitude columns.
totaldistance	meters	Total distance traveled in a time segment using the haversine formula.
avgspeed	km/hr	Average speed in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment.
varspeed	km/hr	Speed variance in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment.
numberofsignificantplaces	places	Number of significant locations visited. It is calculated using the DBSCAN/OPTICS clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
numberlocationtransitions	transitions	Number of movements between any two clusters in a time segment.
radiusgyration	meters	Quantifies the area covered by a participant.
timeattop1location	minutes	Time spent at the most significant location.
timeattop2location	minutes	Time spent at the 2nd most significant location.
timeattop3location	minutes	Time spent at the 3rd most significant location.
movingtostaticratio	-	Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labeled as stationary if its speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine.
outlierstimepercent	-	Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (stationary time. Only stationary samples are clustered). A higher value represents more time spent in non-significant clusters.
maxlengthstayatclusters	minutes	Maximum time spent in a cluster (significant location).
minlengthstayatclusters	minutes	Minimum time spent in a cluster (significant location).
avglengthstayatclusters	minutes	Average time spent in a cluster (significant location).
stdlengthstayatclusters	minutes	Standard deviation of time spent in a cluster (significant location).
locationentropy	nats	Shannon Entropy computed over the row count of each cluster (significant location), it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location).
normalizedlocationentropy	nats	Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters; it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location).
timeathome	minutes	Time spent at home.
timeat[PLACE]	minutes	Time spent at [PLACE], which can be living, exercise, study, greens.

Phone Usage Details

Feature Name	Unit	Description
sumduration	minutes	Total duration of all unlock episodes.
maxduration	minutes	Longest duration of any unlock episode.
minduration	minutes	Shortest duration of any unlock episode.
avgduration	minutes	Average duration of all unlock episodes.
stdduration	minutes	Standard deviation duration of all unlock episodes.
countepisode	episodes	Number of all unlock episodes.
firstuseafter	minutes	Minutes until the first unlock episode.
sumduration[PLACE]	minutes	Total duration of all unlock episodes. [PLACE] can be living, exercise, study, greens. Same below.
maxduration[PLACE]	minutes	Longest duration of any unlock episode.
minduration[PLACE]	minutes	Shortest duration of any unlock episode.
avgduration[PLACE]	minutes	Average duration of all unlock episodes.
stdduration[PLACE]	minutes	Standard deviation duration of all unlock episodes.
countepisode[PLACE]	episodes	Number of all unlock episodes.
firstuseafter[PLACE]	minutes	Minutes until the first unlock episode.

Call Details

Feature Name	Unit	Description
count	calls	Number of calls of a particular call_type (incoming/outgoing) occurred during a particular time_segment.
distinctcontacts	contacts	Number of distinct contacts that are associated with a particular call_type for a particular time_segment.
meanduration	seconds	The mean duration of all calls of a particular call_type during a particular time_segment.
sumduration	seconds	The sum of the duration of all calls of a particular call_type during a particular time_segment.
minduration	seconds	The duration of the shortest call of a particular call_type during a particular time_segment.
maxduration	seconds	The duration of the longest call of a particular call_type during a particular time_segment.
stdduration	seconds	The standard deviation of the duration of all the calls of a particular call_type during a particular time_segment.
modeduration	seconds	The mode of the duration of all the calls of a particular call_type during a particular time_segment.
entropyduration	nats	The estimate of the Shannon entropy for the the duration of all the calls of a particular call_type during a particular time_segment.
timefirstcall	minutes	The time in minutes between 12:00am (midnight) and the first call of call_type.
timelastcall	minutes	The time in minutes between 12:00am (midnight) and the last call of call_type.
countmostfrequentcontact	calls	The number of calls of a particular call_type during a particular time_segment of the most frequent contact throughout the monitored period.

Bluetooth Details

Feature Name	Unit	Description
countscans	scans	Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant’s phone.
uniquedevices	devices	Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses.
meanscans	scans	Mean of the scans of every sensed device within each time segment instance.
stdscans	scans	Standard deviation of the scans of every sensed device within each time segment instance.
countscansmostfrequentdevicewithinsegments	scans	Number of scans of the most sensed device within each time segment instance.
countscansleastfrequentdevicewithinsegments	scans	Number of scans of the least sensed device within each time segment instance.
countscansmostfrequentdeviceacrosssegments	scans	Number of scans of the most sensed device across time segment instances of the same type.
countscansleastfrequentdeviceacrosssegments	scans	Number of scans of the least sensed device across time segment instances of the same type per device.
countscansmostfrequentdeviceacrossdataset	scans	Number of scans of the most sensed device across the entire dataset of every participant.
countscansleastfrequentdeviceacrossdataset	scans	Number of scans of the least sensed device across the entire dataset of every participant.

WiFi Details

Feature Name	Unit	Description
countscans	devices	Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately.
uniquedevices	devices	Number of unique access point during a time_segment as identified by their hardware address.
countscansmostuniquedevice	scans	Number of scans of the most scanned access point during a time_segment across the whole monitoring period.

Physical Activity Details

Feature Name	Unit	Description
maxsumsteps	steps	The maximum daily step count during a time segment.
minsumsteps	steps	The minimum daily step count during a time segment.
avgsumsteps	steps	The average daily step count during a time segment.
mediansumsteps	steps	The median of daily step count during a time segment.
stdsumsteps	steps	The standard deviation of daily step count during a time segment.
sumsteps	steps	The total step count during a time segment.
maxsteps	steps	The maximum step count during a time segment.
minsteps	steps	The minimum step count during a time segment.
avgsteps	steps	The average step count during a time segment.
stdsteps	steps	The standard deviation of step count during a time segment.
countepisodesedentarybout	bouts	Number of sedentary bouts during a time segment.
sumdurationsedentarybout	minutes	Total duration of all sedentary bouts during a time segment.
maxdurationsedentarybout	minutes	The maximum duration of any sedentary bout during a time segment.
mindurationsedentarybout	minutes	The minimum duration of any sedentary bout during a time segment.
avgdurationsedentarybout	minutes	The average duration of sedentary bouts during a time segment.
stddurationsedentarybout	minutes	The standard deviation of the duration of sedentary bouts during a time segment.
countepisodeactivebout	bouts	Number of active bouts during a time segment.
sumdurationactivebout	minutes	Total duration of all active bouts during a time segment.
maxdurationactivebout	minutes	The maximum duration of any active bout during a time segment.
mindurationactivebout	minutes	The minimum duration of any active bout during a time segment.
avgdurationactivebout	minutes	The average duration of active bouts during a time segment.
stddurationactivebout	minutes	The standard deviation of the duration of active bouts during a time segment.

Sleep Details

Feature Name	Unit	Description
countepisode[LEVEL][TYPE]	episodes	Number of [LEVEL][TYPE] sleep episodes. [LEVEL] is one of awake and asleep and [TYPE] is one of main, nap, and all. Same below.
sumduration[LEVEL][TYPE]	minutes	Total duration of all [LEVEL][TYPE] sleep episodes.
maxduration[LEVEL][TYPE]	minutes	Longest duration of any [LEVEL][TYPE] sleep episode.
minduration[LEVEL][TYPE]	minutes	Shortest duration of any [LEVEL][TYPE] sleep episode.
avgduration[LEVEL][TYPE]	minutes	Average duration of all [LEVEL][TYPE] sleep episodes.
medianduration[LEVEL][TYPE]	minutes	Median duration of all [LEVEL][TYPE] sleep episodes.
stdduration[LEVEL][TYPE]	minutes	Standard deviation duration of all [LEVEL][TYPE] sleep episodes.
firstwaketimeTYPE	minutes	First wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode’s end time.
lastwaketimeTYPE	minutes	Last wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode’s end time.
firstbedtimeTYPE	minutes	First bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode’s start time.
lastbedtimeTYPE	minutes	Last bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode’s start time.
countepisodeTYPE	episodes	Number of sleep episodes for a certain sleep type during a time segment.
avgefficiencyTYPE	scores	Average sleep efficiency for a certain sleep type during a time segment.
sumdurationafterwakeupTYPE	minutes	Total duration the user stayed in bed after waking up for a certain sleep type during a time segment.
sumdurationasleepTYPE	minutes	Total sleep duration for a certain sleep type during a time segment.
sumdurationawakeTYPE	minutes	Total duration the user stayed awake but still in bed for a certain sleep type during a time segment.
sumdurationtofallasleepTYPE	minutes	Total duration the user spent to fall asleep for a certain sleep type during a time segment.
sumdurationinbedTYPE	minutes	Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
avgdurationafterwakeupTYPE	minutes	Average duration the user stayed in bed after waking up for a certain sleep type during a time segment.
avgdurationasleepTYPE	minutes	Average sleep duration for a certain sleep type during a time segment.
avgdurationawakeTYPE	minutes	Average duration the user stayed awake but still in bed for a certain sleep type during a time segment.
avgdurationtofallasleepTYPE	minutes	Average duration the user spent to fall asleep for a certain sleep type during a time segment.
avgdurationinbedTYPE	minutes	Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.

Participant Info Data

The ParticipantInfoData folder contains files with additional information.

platform.csv: The file contains each participant's major smartphone platform (iOS or Android), indexed by pid
demographics.csv: Due to privacy concerns, demographic data are only available for special requests. Please reach out to us directly with a clear research plan with demographic data.

Usage Notes

We provide a behavior modeling benchmark platform GLOBEM [8,9]. The platform is designed to support researchers in using, developing, and evaluating different longitudinal behavior modeling methods.

Researchers who use the datasets must agree to the following terms.

Privacy
Although the database has been anonymized, we cannot eliminate all potential risks of privacy information leakage. The PI of any research group access to the dataset, is responsible for continuing to safeguard this database, taking whatever steps are appropriate to protect participants’ privacy and data confidentiality. The specific actions required to safeguard the data may change over time.

Misuse
If at any point, the administrators of the datasets at the University of Washington have concerns or reasonable suspicions that the researcher has violated these usage note, the researcher will be notified. Concerns about misuse may be shared with PhysioNet and other related entities.

Our datasets have led to multiple publications. Please find them in the reference list [11-18].

Release Notes

v1.0 - Release of our GLOBEM Dataset

v1.1 - Add PhQ-4 subscale score & labels into the "dep_weekly.csv" files

Ethics

Our datasets aim at aiding research efforts in the area of developing, testing, and evaluating machine learning algorithms to better understand college students’ (and the potentially more general population) daily behaviors, health, and well-being from continuous sensor streams and self-reports. These findings may support public interest in how to improve student experiences and drive policy around adverse events students and others may experience.

Privacy is the major ethical concern of our data collection studies. Our study has obtained IRB approval from the University of Washington with the IRB number STUDY00003244. Participants signed the consent form before joining our study. We strictly follow the IRB rules to anonymize participants' data. Anyone outside our core data collection group cannot access direct individually-identifiable information. We also eliminated the data for users who stopped their participation at any time during the study. Since some sensitive sensor data (e.g., location) can disclose identities, we only release feature-level data under credentialing to protect against privacy leakage.

Acknowledgements

Our multi-year data collection study closely followed a sister study at Carnegie Mellon University (CMU). We acknowledge all efforts from CMU Study Team to provide important starting and reference materials. Moreover, our studies were greatly inspired by StudentLife researchers from Dartmouth College.

Our studies were supported by the University of Washington (including the Paul G. Allen School of Computer Science and Engineering; Department of Electrical and Computer Engineering; Population Health; Addictions, Drug and Alcohol Institute; and the Center for Research and Education on Accessible Technology and Experiences); the National Science Foundation (EDA-2009977, CHS-2016365, CHS-1941537, IIS1816687 and IIS7974751), the National Institute on Disability, Independent Living and Rehabilitation Research (90DPGE0003-01), Samsung Research America, and Google.

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. Campbell. A survey of mobile phone sensing. IEEE Communications Magazine, 48(9), 2010.
M. E. Morris, Q. Kathawala, T. K. Leen, E. E. Gorenstein, F. Guilak, W. DeLeeuw, and M. Labhard. Mobile therapy: case study evaluations of a cell phone application for emotional self-awareness. Journal of medical Internet research, 12(2):e10, 2010.
R. Wang, F. Chen, Z. Chen, T. Li, G. Harari, S. Tignor, X. Zhou, D. Ben-Zeev, and A. T. Campbell. Studentlife: Assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 3–14. ACM, 2014.
J.-K. Min, A. Doryab, J. Wiese, S. Amini, J. Zimmerman, and J. I. Hong. Toss “n” turn: Smartphone as sleep and sleep quality detector. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, page 477–486, New York, NY, USA, 2014. Association for Computing Machinery.
S. M. Mattingly, J. M. Gregg, P. Audia, A. E. Bayraktaroglu, A. T. Campbell, N. V. Chawla, V. Das Swain, M. De Choudhury, S. K. D’Mello, A. K. Dey, et al. The tesserae project: Large-scale, longitudinal, in-situ, multimodal sensing of information workers. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–8, 2019.
R. Wang, G. Harari, P. Hao, X. Zhou, and A. T. Campbell. Smartgpa: how smartphones can assess and predict academic performance of college students. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, pages 295–306, 2015.
D. Ferreira, V. Kostakos, and A. K. Dey. Aware: Mobile context instrumentation framework. Frontiers in ICT, 2:6, 2015.
GLOBEM Home Page. https://the-globem.github.io
Benchmark Platform GLOBEM. https://github.com/UW-EXP/GLOBEM/
Rapids documentation. https://www.rapids.science/1.6/
M. E. Morris, K. S. Kuehn, J. Brown, P. S. Nurius, H. Zhang, Y. S. Sefidgar, X. Xu, E. A. Riskin, A. K. Dey, S. Consolvo, and J. C. Mankoff. College from home during COVID-19: A mixed-methods study of heterogeneous experiences. PLOS ONE, 16(6):e0251580, June 2021.
P. S. Nurius, Y. S. Sefidgar, K. S. Kuehn, J. Jung, H. Zhang, O. Figueira, E. A. Riskin, A. K. Dey, and J. C. Mankoff. Distress among undergraduates: Marginality, stressors and resilience resources. Journal of American College Health, pages 1–9, July 2021.
Y. S. Sefidgar, W. Seo, K. S. Kuehn, T. Althoff, A. Browning, E. Riskin, P. S. Nurius, A. K. Dey, and J. Mankoff. Passively-sensed Behavioral Correlates of Discrimination Events in College Students. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–29, Nov. 2019.
X. Xu, P. Chikersal, A. Doryab, D. K. Villalba, J. M. Dutcher, M. J. Tumminia, T. Althoff, S. Cohen, K. G. Creswell, J. D. Creswell, J. Mankoff, and A. K. Dey. Leveraging Routine Behavior and Contextually- Filtered Features for Depression Detection among College Students. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(3):1–33, Sept. 2019.
X. Xu, P. Chikersal, J. M. Dutcher, Y. S. Sefidgar, W. Seo, M. J. Tumminia, D. K. Villalba, S. Cohen, K. G. Creswell, J. D. Creswell, A. Doryab, P. S. Nurius, E. Riskin, A. K. Dey, and J. Mankoff. Leveraging Collaborative-Filtering for Personalized Behavior Modeling: A Case Study of Depression Detection among College Students. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(1):1–27, Mar. 2021.
X. Xu, J. Mankoff, and A. K. Dey. Understanding practices and needs of researchers in human state modeling by passive mobile sensing. CCF Transactions on Pervasive Computing and Interaction, July 2021.
H. Zhang, M. E. Morris, P. S. Nurius, K. Mack, J. Brown, K. S. Kuehn, Y. S. Sefidgar, X. Xu, E. A. Riskin, A. K. Dey, and J. Mankoff. Impact of Online Learning in the Context of COVID-19 on Undergraduates with Disabilities and Mental Health Concerns. ACM Transactions on Accessible Computing, page 3538514, July 2022.
H. Zhang, P. Nurius, Y. Sefidgar, M. Morris, S. Balasubramanian, J. Brown, A. K. Dey, K. Kuehn, E. Riskin, X. Xu, and J. Mankoff. How Does COVID-19 impact Students with Disabilities/Health Concerns? In arXiv. arXiv, May 2020. arXiv:2005.05438. [cs]