Database Credentialed Access

CovIdentify Dataset

Peter Cho Md Mobashir Hasan Shandhi Ali Roghanizad Jessilyn Dunn

Published: Nov. 25, 2024. Version: 1.0.0


When using this resource, please cite: (show more options)
Cho, P., Shandhi, M. M. H., Roghanizad, A., & Dunn, J. (2024). CovIdentify Dataset (version 1.0.0). PhysioNet. https://doi.org/10.13026/ncq1-vp79.

Additionally, please cite the original publication:

A Method for Intelligent Allocation of Diagnostic Testing by Leveraging Data from Commercial Wearable Devices: A Case Study on COVID-19. (2022, April 1). https://doi.org/10.21203/rs.3.rs-1490524/v1

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This dataset supports the study "A method for intelligent allocation of diagnostic testing by leveraging data from commercial wearable devices: a case study on COVID-19," which developed an Intelligent Testing Allocation (ITA) method. The study demonstrated the efficacy of using continuous digital biomarkers like resting heart rate and steps to enhance COVID-19 diagnostic testing positivity rates. The findings suggest significant potential for large-scale, symptom-independent surveillance testing to alleviate diagnostic test shortages. The provided data is from the CovIdentify study launched by Duke's BIG IDEAs Lab in the Biomedical Engineering Department. From April 2nd, 2020 to May 25th, 2021, 2,887 participants connected their smartwatches to the CovIdentify platform, including 1,689 Garmin, 1,091 Fitbit, and 107 Apple smartwatches


Background

The onset of COVID-19 and subsequent variants, alongside emerging diseases like monkeypox, underscored global challenges in infectious disease surveillance, particularly in the face of diagnostic test shortages. The need for efficient and effective mass surveillance methods became a priority. This study addresses this need by proposing the ITA method, leveraging wearable device data to optimize diagnostic testing. By focusing on early infection indicators like changes in resting heart rate and physical activity levels, this approach aims to improve testing resource allocation and manage the burden of ongoing and future outbreaks. The dataset represents a comprehensive collection of biomarkers from wearable devices, offering a unique opportunity for in-depth analysis in public health surveillance.


Methods

The CovIdentify study was launched on April 2, 2020 (Duke University Institutional Review Board #2020-0412).

Participant recruitment and data collection

Eligibility criteria included age over 18 years and internet access. Social networks and social media advertising were used to recruit participants. By May 25, 2021, a total of 7348 participants were recruited and e-consented through the REDCap system. During enrollment, participants were given the option to donate 12 months of retrospective wearable data and 12 months of prospective wearable data. Wearable data was collected via the CovIdentify iOS app for devices connected to the Apple HealthKit (e.g., Apple Watch) or via Application Programming Interfaces for other devices (e.g., Garmin and Fitbit devices). The participants were also asked to complete an onboarding (enrollment) survey and daily surveys. The surveys were in English or Spanish and included questions on symptoms, social distancing, diagnostic testing results, and related information.

Surveys were collected using the CovIdentify iOS app, text messaging, and/or emails. All wearable data and survey results were stored in a secured Microsoft Azure data platform and later analyzed in the Microsoft Azure Machine Learning environment. Soon after CovIdentify was launched, exploratory data analysis (EDA) revealed major differences between CovIdentify demographics and the demographics of COVID-19-positive cases and deaths in the U.S., as well as overall U.S. demographics based on the 2020 U.S. Census. We sought to mitigate the imbalance throughout the duration of the study by providing wearable devices to underrepresented populations. COVID-19 vaccine reporting was added to the daily surveys in February 2021, where we asked questions regarding the vaccination date, vaccine brand, vaccine-related symptoms, and dose number.

Wearable data processing and analysis

Participants were asked to fill out an enrollment survey following the informed e-consent. Daily symptom surveys and wearable data from the participants were analyzed both separately and together. For the overall analysis, we only included participants with self-reported diagnostic test results for COVID-19. These participants were further divided into two categories based on the self-reported diagnostic test results: COVID-19 positive and COVID-19 negative.


Data Description

This dataset contains the survey data and wearable device data from participants who enrolled in the CovIdentify study from April 2nd, 2020 to May 25th, 2021. A total of 7,348 participants e-consented to the CovIdentify study through the secure Research Electronic Data Capture (REDCap) system. Of those who consented, 6,765 participants enrolled in the study by completing an enrollment survey consisting of 37–61 questions that followed branching logic. Of those enrolled, 2887 participants connected their smartwatches to the CovIdentify platform, including 1,689 Garmin, 1,091 Fitbit, and 107 Apple smartwatches. Throughout the study, 362,108 daily surveys were completed by 5,859 unique participants, with a mean of 62 and a median of 37 daily surveys completed per individual. Of all CovIdentify participants, 1,289 participants reported at least one diagnostic test result for COVID-19 (132 positive and 1,157 negative).

The two main wearable device files are the rhr and step files which contains the resting heart rate and steps measurements, respectively, taken at the daily level. The heart rate "Value" is the resting heart rate for the day (device reported) and the steps is the total number of steps per day. The heart rate file also includes, the total number of observations for that day, minimum heart rate (heartrate_min), median heart rate (heartrate_median), mean heart rate (heartrate_mean), and the 10 percentile heart rate (heartrate_q10). These measurements were calculated based on the higher resolution, minute-level or 15-minute level data.

The participant's wearable device data is matched to their survey data which is in the "covidentify_survey_upload.csv". Participants were de-identified and randomized by assigning a unique hash_id. Date information was dropped and the days relative to the enrollment is included (negative days are days prior to testing). The "survey_filled" is either 1, 2, or 3 (filled, not filled, left study). The "redcap_event_name" is the type of survey that was completed by the participant, with "consent_arm_1" referring to the first day of consenting and the "baseline_arm_1" being the second day of the study for that participant. "Ethnicity" refers to Hispanic or Not Hispanic with "1" being true, "0" being false, and "999" being did not specify.


Usage Notes

The data has already been used in the following papers: [1-2]. The dataset can be used for tasks related to participant surveying behaviors in longitudinal digital health studies and in matching participant's wearable device data with self-reported COVID-19 testing results. Limitations include that participants' report may be unreliable as there were no external validation of their testing results. This limitation also applies to any other questions within the study survey. This data can be run on code available on: [3]


Release Notes

Version 1.0.0: Initial release.


Ethics

The study CovIdentify adhered to strict ethical guidelines. The protocol for data collection involving human subjects was approved by relevant Institutional Review Boards (IRBs) or ethics committees, ensuring compliance with ethical standards in research (Protocol Number: 2020-0412). Participants in the study provided written informed consent, acknowledging their understanding and agreement to the use of their data for research purposes. This approach safeguards participant privacy and autonomy. While the dataset presents significant benefits in enhancing public health surveillance and response to infectious diseases, risks related to data privacy and confidentiality were meticulously managed. All data handling procedures were designed to uphold the highest ethical standards, respecting the rights and welfare of the participants. The protocol permits data sharing as long as aggregated, de-identified data is shared.


Conflicts of Interest

There are no conflicts of interest. 


References

  1. A Method for Intelligent Allocation of Diagnostic Testing by Leveraging Data from Commercial Wearable Devices: A Case Study on COVID-19. (2022, April 1). https://doi.org/10.21203/rs.3.rs-1490524/v1
  2. Data-Driven Approaches Uncover Key Factors in Digital Health Study Adherence and Retention https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4473136
  3. CovIdentify GitHub Repository https://github.com/Big-Ideas-Lab/CovIdentify

Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery
Corresponding Author
You must be logged in to view the contact information.

Files