hyfer
package
The hyfer
package provides utilities for interacting with data collected by Hyfe cough detection apps (www.hyfe.ai). This package was designed to be used by Hyfe analysts and external research partners alike.
hyfer
in a nutshellPut simply, hyfer
processes raw Hyfe data – which you downloaded in a standard format referred to as a hyfe_data
object – into a polished format for tables and plots. We refer to that post-processed, analysis-ready data simply as a hyfe
object.
The following chunk of code shows you the whole game; use it as a template for starting your own analysis. The rest of the vignette explains each bit of this code, demonstrates other hyfer
functions, and provides plot examples.
# Install hyfer
library(devtools)
devtools::install_github('hyfe-ai/hyfer', force=TRUE, quiet=TRUE)
library(hyfer)
# Other dependencies
library(dplyr)
library(lubridate)
library(ggplot2)
# Bring in your hyfe_data object (here we use sample data)
data(hyfe_data)
# Process data for all users together
ho <- process_hyfe_data(hyfe_data)
# ... or process users separately
ho_by_user <- process_hyfe_data(hyfe_data, by_user = TRUE)
# summarize data
hyfe_summarize(ho_by_user)
# Now ready for plotting, etc.
hyfer
The hyfer
package is maintained on GitHub
and can be installed as follows:
Hyfe data have been formatted for use with the tidyverse
of packages, particularly dplyr
, lubridate
, and ggplot2
.
This package assumes (1) you already have some Hyfe data locally on your computer, and (2) those data are structured in a standardized way, as a hyfe_data
object (see next section).
Hyfe’s research collaborators can download data for their respective research cohorts from the Hyfe Research Dashboard. Hyfe’s internal analysts download data directly using hyferdrive
, a private company package.
Both the dashboard and hyferdrive
deliver data structured in exactly the same way, allowing both groups to utilize the functions offered in hyfer
.
To get started in hyfer
, begin by using a sample dataset that comes built-in to the package:
This sample dataset contains Hyfe data for two “super-users” of the Hyfe Cough Tracker app.
hyfe_data
objectAll downloaded Hyfe data are provided in a standardized data format: a hyfe_data
object. A hyfe_data
object is simply a list with 6 standard slots.
A detailed description of each slot is provided below.
hyfe_data$id_key
The id_key
slot provides the unique identifiers for each user represented in the data.
hyfe_data$sessions
The sessions
slot provides details for each session of user activity for all users in the data.
hyfe_data$sessions %>% names()
#> [1] "uid" "start" "stop" "duration" "session_id"
#> [6] "device_info" "name" "email" "alias" "cohort_id"
hyfe_data$sessions %>% head()
#> uid start stop duration
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611696415 1611696415 0
#> 2 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611696454 1611696454 0
#> 3 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611699418 1611727018 27600
#> 4 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611728433 1611728433 0
#> 5 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611783683 1611813084 29401
#> 6 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611817248 1611817248 0
#> session_id
#> 1 b22421d1-6d80-4fd6-86c7-b4b2f070ecac
#> 2 6fc2ecbf-15e4-4160-ac33-3a2ce797b461
#> 3 5175dc5b-4470-4f51-b307-6cc2ded3d85e
#> 4 b3148b85-6176-4756-8de6-b94e4f8d6015
#> 5 c73c2d70-5e52-4fae-aa50-dbc8a8551ad3
#> 6 0905a507-49ab-4748-909e-92025c0fe87e
#> device_info
#> 1 {"id": "552b2002-d40c-4e6e-a6f4-7cda2074195b", "model": "LM-X430", "vendor": "LGE", "os_version": "28", "app_version": "a1.21.15"}
#> 2 {"id": "552b2002-d40c-4e6e-a6f4-7cda2074195b", "model": "LM-X430", "vendor": "LGE", "os_version": "28", "app_version": "a1.21.15"}
#> 3 {"id": "552b2002-d40c-4e6e-a6f4-7cda2074195b", "model": "LM-X430", "vendor": "LGE", "os_version": "28", "app_version": "a1.21.15"}
#> 4 {"id": "552b2002-d40c-4e6e-a6f4-7cda2074195b", "model": "LM-X430", "vendor": "LGE", "os_version": "28", "app_version": "a1.21.15"}
#> 5 {"id": "552b2002-d40c-4e6e-a6f4-7cda2074195b", "model": "LM-X430", "vendor": "LGE", "os_version": "28", "app_version": "a1.21.15"}
#> 6 {"id": "552b2002-d40c-4e6e-a6f4-7cda2074195b", "model": "LM-X430", "vendor": "LGE", "os_version": "28", "app_version": "a1.21.15"}
#> name email alias cohort_id
#> 1 <NA> navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra
#> 2 <NA> navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra
#> 3 <NA> navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra
#> 4 <NA> navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra
#> 5 <NA> navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra
#> 6 <NA> navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra
The start
and stop
times of each session of user activity are provided as numeric timestamps, as are all other date/time fields in the hyfe_data
object. Though they are not easy to read, timestamps are an unambiguous and timezone-agnostic representation of date/time. Timestamps represent the seconds since midnight UTC on January 1, 1970.
hyfe_data$sounds
The sounds
slot provides details for each explosive sound detected for all users in the data.
hyfe_data$sounds %>% names()
#> [1] "uid" "timestamp" "prediction_score"
#> [4] "is_cough" "onboarding_cough" "loudness"
#> [7] "snr" "loudness_threshold" "snr_threshold"
#> [10] "highpass_frequency" "peak_start_offset" "sound_id"
#> [13] "session_id" "url_peak" "url_parent"
#> [16] "name" "email" "alias"
#> [19] "cohort_id"
hyfe_data$sounds %>% head()
#> uid timestamp prediction_score is_cough
#> 1 9D7SChvklVa7zya0LdU6YVOi9QV2 1626794485 0.004492915 FALSE
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 1626794936 0.005674792 FALSE
#> 3 9D7SChvklVa7zya0LdU6YVOi9QV2 1626794951 0.039426319 FALSE
#> 4 9D7SChvklVa7zya0LdU6YVOi9QV2 1626795054 0.806312203 FALSE
#> 5 9D7SChvklVa7zya0LdU6YVOi9QV2 1626795366 0.006854793 FALSE
#> 6 9D7SChvklVa7zya0LdU6YVOi9QV2 1626795462 0.020625576 FALSE
#> onboarding_cough loudness snr loudness_threshold snr_threshold
#> 1 FALSE 63.20647 27.22247 58 18
#> 2 FALSE 59.17874 18.53862 58 18
#> 3 FALSE 59.71023 28.82942 58 18
#> 4 FALSE 66.05753 42.07871 58 18
#> 5 FALSE 58.36067 33.21019 58 18
#> 6 FALSE 67.97848 27.63222 58 18
#> highpass_frequency peak_start_offset sound_id
#> 1 0.35 3.14 7230f77e5db83ab4a8e888785bacbb35
#> 2 0.35 2.54 64f8f4da93b535bb8448f1752affd2e4
#> 3 0.35 18.18 4ae37ee879613a55a85481fc12ebef13
#> 4 0.35 0.82 1682bd40ea7535c3827f798eb74b92a0
#> 5 0.35 12.26 941483bb38733f278bf5cec5afc4a031
#> 6 0.35 17.18 630c34020351368788b6a0e071d64eab
#> session_id url_peak
#> 1 <NA> user/9D7SChvklVa7zya0LdU6YVOi9QV2/1626794482637-recording-1.wav
#> 2 <NA> user/9D7SChvklVa7zya0LdU6YVOi9QV2/1626794933684-recording-1.wav
#> 3 <NA> user/9D7SChvklVa7zya0LdU6YVOi9QV2/1626794933684-recording-2.wav
#> 4 <NA> user/9D7SChvklVa7zya0LdU6YVOi9QV2/1626795053960-recording-1.wav
#> 5 <NA> user/9D7SChvklVa7zya0LdU6YVOi9QV2/1626795354670-recording-1.wav
#> 6 <NA> user/9D7SChvklVa7zya0LdU6YVOi9QV2/1626795444868-recording-1.wav
#> url_parent name
#> 1 samples/9D7SChvklVa7zya0LdU6YVOi9QV2/sample-1626794482637.m4a <NA>
#> 2 samples/9D7SChvklVa7zya0LdU6YVOi9QV2/sample-1626794933684.m4a <NA>
#> 3 samples/9D7SChvklVa7zya0LdU6YVOi9QV2/sample-1626794933684.m4a <NA>
#> 4 samples/9D7SChvklVa7zya0LdU6YVOi9QV2/sample-1626795053960.m4a <NA>
#> 5 samples/9D7SChvklVa7zya0LdU6YVOi9QV2/sample-1626795354670.m4a <NA>
#> 6 samples/9D7SChvklVa7zya0LdU6YVOi9QV2/sample-1626795444868.m4a <NA>
#> email alias cohort_id
#> 1 navarra+12@hyfeapp.com navarra+12@hyfeapp.com Navarra
#> 2 navarra+12@hyfeapp.com navarra+12@hyfeapp.com Navarra
#> 3 navarra+12@hyfeapp.com navarra+12@hyfeapp.com Navarra
#> 4 navarra+12@hyfeapp.com navarra+12@hyfeapp.com Navarra
#> 5 navarra+12@hyfeapp.com navarra+12@hyfeapp.com Navarra
#> 6 navarra+12@hyfeapp.com navarra+12@hyfeapp.com Navarra
The column prediction_score
contains the probability that the explosive sound is a cough, based upon Hyfe’s cough classification algorithms.
The column is_cough
is a boolean (TRUE
/ FALSE
) stating whether or not the prediction score is above Hyfe’s cough prediction threshold of 0.85.
The column onboarding_cough
is a boolean stating whether or not this sound was collected while the user was onboarding (following instructions upon log in to cough into the app). Since these are elicited coughs, in certain analyses it may be useful to ignore coughs for which onboarding_cough == TRUE
.
hyfe_data$locations
The locations
slot provides details for each location fix for all users in the data.
hyfe_data$locations %>% names()
#> [1] "uid" "timestamp" "longitude" "latitude"
#> [5] "resolution" "location_id" "location_index" "app_version"
#> [9] "device_info" "name" "email" "alias"
#> [13] "cohort_id"
hyfe_data$locations %>% head()
#> [1] uid timestamp longitude latitude resolution
#> [6] location_id location_index app_version device_info name
#> [11] email alias cohort_id
#> <0 rows> (or 0-length row.names)
Note that some studies, such as the one related to this sample data, have location data service disabled.
hyfe_data$labels
This slot is empty for now. It is a placeholder for a future time in which hyfe will have manually labelled sounds from a dataset associated with it inside the hyfe_data
object.
hyfe_data$cohort_settings
hyfe_data$cohort_settings %>% names()
#> [1] "cohort_id" "timezone" "is_virtual"
#> [4] "h3_zoom_level" "snr_threshold" "loudness_threshold"
#> [7] "location_data_enabled"
hyfe_data$cohort_settings
#> cohort_id timezone is_virtual h3_zoom_level snr_threshold
#> 1 Navarra Europe/Madrid TRUE 15 18
#> loudness_threshold location_data_enabled
#> 1 58 true
The cohort_settings
slot will only be populated if the hyfe_data
object is for a research cohort. Otherwise this slot will be NULL
. Critically, cohort_settings
contains the timezone used to determine local time in the function format_hyfe_time()
.
Once you download Hyfe data, the first step is to process it.
This returns a standard hyfe
object (ho
for short variable names), a named list with the original hyfe_data
slots plus new ones. These hyfe
objects are formatted to make subsequent plots and analyses as simple as possible. The standard hyfe
object structure is explored in detail in the next session.
By default, the process_hyfe_data()
function lumps all user data together before summarizing, even if multiple users are present. To summarize each user separately, use the input by_user
:
If you want to work with data from only a single user in a hyfe_data
object containing data from multiple users, use the function filter_to_user()
before processing. Your workflow would look like:
# Look at your ID options
hyfe_data$id_key
# Filter data to the first ID in that list
hyfe_data_1 <- filter_to_user(uid = hyfe_data$id_key$uid[1],
hyfe_data)
# Now process the data into a hyfe object
ho <- process_hyfe_data(hyfe_data_1)
As explained above, the argument uid
refers to the anonymous identifier assigns to each user.
hyfe
objectOnce a hyfe_data
object is processed, it becomes a hyfe
object. The structure and formatting of the hyfe
object is designed to accommodate plotting and analysis.
ho %>% names
#> [1] "id_key" "sessions" "sounds" "locations"
#> [5] "labels" "cohort_settings" "coughs" "hours"
#> [9] "days" "weeks"
The first several slots contain the raw data from the hyfe_data
object, and those data are unchanged.
The coughs
slot has all explosive sounds classified as a cough, with various new date/time variables to streamline plotting and analysis.
ho$coughs %>% head
#> uid timestamp prediction_score is_cough
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611700789 0.9998544 TRUE
#> 2 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611696403 0.6972452 TRUE
#> 3 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611700789 0.9998734 TRUE
#> 4 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611700790 0.9996773 TRUE
#> 5 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611721840 0.9906024 TRUE
#> 6 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1612070577 0.9858828 TRUE
#> onboarding_cough loudness snr loudness_threshold snr_threshold
#> 1 FALSE 71.69626 41.40092 NA NA
#> 2 TRUE 72.07907 29.02771 NA NA
#> 3 FALSE 76.74911 46.97339 NA NA
#> 4 FALSE 71.62092 39.48697 NA NA
#> 5 FALSE 71.43337 19.44805 NA NA
#> 6 FALSE 71.21107 40.48111 NA NA
#> highpass_frequency peak_start_offset sound_id
#> 1 NA 17.10 532bc8adcaf23eb491be98a58ecfc353
#> 2 NA 0.50 a1c8b4213cab3c09ba3812a2d645ba5f
#> 3 NA 16.30 da6d5dc3a20a3711a76ed619945a16b3
#> 4 NA 17.62 f4121dcccabc3513a1768c2a4a988090
#> 5 NA 10.98 4ff2d4513d693f4d8adbb148478d17ec
#> 6 NA 14.22 e0bd3e48873b38759693df07d60ed307
#> session_id url_peak
#> 1 <NA> user/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/1611700772764-recording-2.wav
#> 2 <NA> user/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/1611696402848-recording-1.wav
#> 3 <NA> user/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/1611700772764-recording-1.wav
#> 4 <NA> user/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/1611700772764-recording-3.wav
#> 5 <NA> user/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/1611721829192-recording-1.wav
#> 6 <NA> user/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/1612070563377-recording-1.wav
#> url_parent name
#> 1 samples/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/sample-1611700772764.m4a <NA>
#> 2 onboarding/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/sample-1611696402848.m4a <NA>
#> 3 samples/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/sample-1611700772764.m4a <NA>
#> 4 samples/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/sample-1611700772764.m4a <NA>
#> 5 samples/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/sample-1611721829192.m4a <NA>
#> 6 samples/5Ue2PKP6KMUUbQcVIIjWu8rglIU2/sample-1612070563377.m4a <NA>
#> email alias cohort_id date_time
#> 1 navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra 2021-01-26 22:39:49
#> 2 navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra 2021-01-26 21:26:43
#> 3 navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra 2021-01-26 22:39:49
#> 4 navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra 2021-01-26 22:39:50
#> 5 navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra 2021-01-27 04:30:40
#> 6 navarra+73@hyfeapp.com navarra+73@hyfeapp.com Navarra 2021-01-31 05:22:57
#> tz date date_floor date_ceiling year week yday hour study_week
#> 1 UTC 2021-01-26 1611619200 1611705600 2021 4 26 22 12
#> 2 UTC 2021-01-26 1611619200 1611705600 2021 4 26 21 12
#> 3 UTC 2021-01-26 1611619200 1611705600 2021 4 26 22 12
#> 4 UTC 2021-01-26 1611619200 1611705600 2021 4 26 22 12
#> 5 UTC 2021-01-27 1611705600 1611792000 2021 4 27 4 12
#> 6 UTC 2021-01-31 1612051200 1612137600 2021 5 31 5 13
#> study_day study_hour frac_week frac_day frac_hour
#> 1 82 1953 11.61919 81.33436 1952.025
#> 2 82 1951 11.61194 81.28360 1950.806
#> 3 82 1953 11.61919 81.33436 1952.025
#> 4 82 1953 11.61920 81.33437 1952.025
#> 5 82 1958 11.65400 81.57801 1957.872
#> 6 86 2055 12.23062 85.61432 2054.744
Note that the final columns in this coughs
table provide data about dates and times. The date_time
column is an attempt to convert the UTC timestamp into local time according to the timezone specified in ho$cohort_settings.
These date/time fields are generated by the helper function format_hyfe_time()
. See further details further down.
The hours
, days
, and weeks
slots hold summary timetables of session activity, peak/cough detections, and cough rates for the entire dataset.
ho$hours %>% head
#> timestamp date_time tz date date_floor
#> 1 1604617200 2020-11-06 00:00:00 Europe/Madrid 2020-11-06 1604617200
#> 2 1604620800 2020-11-06 01:00:00 Europe/Madrid 2020-11-06 1604617200
#> 3 1604624400 2020-11-06 02:00:00 Europe/Madrid 2020-11-06 1604617200
#> 4 1604628000 2020-11-06 03:00:00 Europe/Madrid 2020-11-06 1604617200
#> 5 1604631600 2020-11-06 04:00:00 Europe/Madrid 2020-11-06 1604617200
#> 6 1604635200 2020-11-06 05:00:00 Europe/Madrid 2020-11-06 1604617200
#> date_ceiling year week yday hour study_week study_day study_hour frac_week
#> 1 1604617200 2020 45 311 0 0 0 0 0.000000000
#> 2 1604703600 2020 45 311 1 1 1 1 0.005952381
#> 3 1604703600 2020 45 311 2 1 1 2 0.011904762
#> 4 1604703600 2020 45 311 3 1 1 3 0.017857143
#> 5 1604703600 2020 45 311 4 1 1 4 0.023809524
#> 6 1604703600 2020 45 311 5 1 1 5 0.029761905
#> frac_day frac_hour n_uid session_seconds session_hours session_days peaks
#> 1 0.00000000 0 0 0 0 0 0
#> 2 0.04166667 1 0 0 0 0 0
#> 3 0.08333333 2 0 0 0 0 0
#> 4 0.12500000 3 0 0 0 0 0
#> 5 0.16666667 4 0 0 0 0 0
#> 6 0.20833333 5 0 0 0 0 0
#> coughs cough_rate session_seconds_tot session_hours_tot session_days_tot
#> 1 0 NaN 0 0 0
#> 2 0 NaN 0 0 0
#> 3 0 NaN 0 0 0
#> 4 0 NaN 0 0 0
#> 5 0 NaN 0 0 0
#> 6 0 NaN 0 0 0
#> peaks_tot coughs_tot
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
ho$days %>% as.data.frame %>% head
#> date tz date_floor date_ceiling year week yday study_week
#> 1 2020-11-06 Europe/Madrid 1604617200 1604617200 2020 45 311 0
#> 2 2020-11-07 Europe/Madrid 1604703600 1604703600 2020 45 312 1
#> 3 2020-11-08 Europe/Madrid 1604790000 1604790000 2020 45 313 1
#> 4 2020-11-09 Europe/Madrid 1604876400 1604876400 2020 45 314 1
#> 5 2020-11-10 Europe/Madrid 1604962800 1604962800 2020 45 315 1
#> 6 2020-11-11 Europe/Madrid 1605049200 1605049200 2020 46 316 1
#> study_day n_uid session_seconds session_hours session_days peaks coughs
#> 1 0 1 30060 8.350000 0.3479167 24 24
#> 2 1 1 86244 23.956667 0.9981944 58 58
#> 3 2 1 86400 24.000000 1.0000000 140 140
#> 4 3 1 86400 24.000000 1.0000000 91 91
#> 5 4 1 34310 9.530556 0.3971065 23 23
#> 6 5 1 20285 5.634722 0.2347801 5 5
#> cough_rate session_seconds_tot session_hours_tot session_days_tot peaks_tot
#> 1 68.98204 30060 8.35000 0.3479167 24
#> 2 58.10491 116304 32.30667 1.3461111 82
#> 3 140.00000 202704 56.30667 2.3461111 222
#> 4 91.00000 289104 80.30667 3.3461111 313
#> 5 57.91897 323414 89.83722 3.7432176 336
#> 6 21.29652 343699 95.47194 3.9779977 341
#> coughs_tot
#> 1 24
#> 2 82
#> 3 222
#> 4 313
#> 5 336
#> 6 341
ho$weeks %>% as.data.frame %>% head
#> week tz date_floor date_ceiling year study_week n_uid
#> 1 45 Europe/Madrid 1604617200 1605049200 2020 0 1
#> 2 46 Europe/Madrid 1605049200 1605654000 2020 1 1
#> 3 47 Europe/Madrid 1605654000 1606258800 2020 2 1
#> 4 48 Europe/Madrid 1606258800 1606863600 2020 3 1
#> 5 49 Europe/Madrid 1606863600 1607468400 2020 4 1
#> 6 50 Europe/Madrid 1607468400 1608073200 2020 5 1
#> session_seconds session_hours session_days peaks coughs cough_rate
#> 1 323414 89.83722 3.743218 336 336 628.3364
#> 2 394832 109.67556 4.569815 269 228 349.2483
#> 3 389373 108.15917 4.506632 4313 369 573.1553
#> 4 322348 89.54111 3.730880 4072 140 262.6726
#> 5 476167 132.26861 5.511192 2576 306 388.6636
#> 6 604163 167.82306 6.992627 5618 331 331.3490
#> session_seconds_tot session_hours_tot session_days_tot peaks_tot coughs_tot
#> 1 323414 89.83722 3.743218 336 336
#> 2 718246 199.51278 8.313032 605 564
#> 3 1107619 307.67194 12.819664 4918 933
#> 4 1429967 397.21306 16.550544 8990 1073
#> 5 1906134 529.48167 22.061736 11566 1379
#> 6 2510297 697.30472 29.054363 17184 1710
Note that when processed with by_user = TRUE
, the slot names are slightly different:
ho_by_user %>% names
#> [1] "id_key" "sessions" "sounds" "locations"
#> [5] "labels" "cohort_settings" "coughs" "user_summaries"
The user_summaries
slot is itself a list; each of its slots pertains to a single user. Each user has a list of 4 tables: hours
, days
, weeks
, and id_key
.
Custom plotting functions are under construction. For now, ggplot
works well with hyfe
objects (see examples in Overview above.)
By default, plot_total()
plots sessions. But you can specify for detections of explosive sounds …
… and of coughs:
The default time unit for this function is days
, but you can specify for hours
….
… and for weeks
as well:
This function, like other plot...()
functions in hyfer
, allows you to return the dataset underlying the plot in addition to (or instead of) producing the plot.
plot_total(ho,
unit='weeks',
type='coughs',
print_plot = FALSE,
return_data = TRUE)
#> $data
#> # A tibble: 43 × 2
#> x y
#> <dttm> <dbl>
#> 1 2020-11-05 23:00:00 336
#> 2 2020-11-10 23:00:00 564
#> 3 2020-11-17 23:00:00 933
#> 4 2020-11-24 23:00:00 1073
#> 5 2020-12-01 23:00:00 1379
#> 6 2020-12-08 23:00:00 1710
#> 7 2020-12-15 23:00:00 2005
#> 8 2020-12-22 23:00:00 2059
#> 9 2020-12-31 23:00:00 2247
#> 10 2020-12-29 23:00:00 2247
#> # … with 33 more rows
This function also allows you to return the ggplot
object so that you can add to it before printing the plot:
plot_total(ho,
unit='days',
type='sessions',
print_plot=FALSE,
return_plot=TRUE)$plot +
ggplot2::labs(title='Monitoring time (person-days)')
These plot functions also accept ho
objects that are processed with by_user = TRUE
:
Note that the users’ individual datasets are pooled in order to make a single aggregate plot.
But when using ho_by_user
, you can also plot each user separately on the same plot:
For all users together:
For each user separately:
This plotting function has the same optional inputs as plot_total()
, {#runningmean} with the addition of an option for overlaying a running mean:
The units of the running_mean
argument is the same as that of the unit
argument. This option is only listened to when you process your hyfe
data with by_user=FALSE
.
For all users together:
For each user separately:
To overlap users such that all of their timeseries begin at the origin, use plot_trajectories()
. This can be helpful when studying the evolution of cough in the days following enrollment or hospitalization, or studying user retention in the days since signing up with the app. Note that this function only accepts hyfe
data that were processed with by_user = TRUE
.
For each user separately:
For all user trajectories pooled together:
For all users together:
For each user separately:
Diagnostic plots can be helpful in data review and technical troubleshooting. To explore all of a cohort dataset quickly at once, use plot_cohort_diagnostic()
:
In this plot, each user has a row of data. The grey bars indicate session activity and red dots indicate cough detections.
To produce a diagnostic plot for a single user, use plot_user_diagnostic()
:
# Look at your ID options
hyfe_data$id_key
#> uid name email
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 <NA> navarra+73@hyfeapp.com
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 <NA> navarra+12@hyfeapp.com
#> alias cohort_id
#> 1 navarra+73@hyfeapp.com Navarra
#> 2 navarra+12@hyfeapp.com Navarra
# Filter data to the first ID in that list
hyfe_data_1 <- filter_to_user(uid = hyfe_data$id_key$uid[1],
hyfe_data)
# Now process the data into a hyfe object for a single user
ho1 <- process_hyfe_data(hyfe_data_1)
plot_user_diagnostic(ho1)
The grey bars indicate session activity and the red dots indicate cough detections.
To get summary metrics for a hyfe
object, use the function hyfe_summarize()
:
hyfe_summarize(ho)
#> $overall
#> users seconds hours days years sounds coughs hourly_n hourly_rate
#> 1 2 20693233 5748.12 239.505 0.6561781 853382 7148 5407 1.287206
#> hourly_var hourly_sd hourly_max daily_n daily_rate daily_var daily_sd
#> 1 15.51397 3.938778 62 259 30.14035 608.292 24.66358
#> daily_max
#> 1 140
#>
#> $users
#> NULL
Since the ho
object was processed by aggregating all users together (note that the users
slot in the output is NULL
), the cough rates reported should be treated with caution: these rates are going to be biased by users with (1) a lot of monitoring time and (2) a lot of coughs.
To summarize Hyfe data from multiple users in a way that is truly balanced, in which each user is weighted equally, you should use a hyfe
object processed with by_user = TRUE
:
hyfe_summarize(ho_by_user)
#> $overall
#> users seconds hours days years sounds coughs hourly_n hourly_rate
#> 1 2 21069973 5852.77 243.8654 0.6681245 869312 7362 2933 1.043988
#> hourly_var hourly_sd hourly_max daily_n daily_rate daily_var daily_sd
#> 1 0.1292396 0.3594991 1.298192 149.5 21.23788 189.6924 13.77289
#> daily_max
#> 1 30.97678
#>
#> $users
#> uid name email
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 <NA> navarra+73@hyfeapp.com
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 <NA> navarra+12@hyfeapp.com
#> alias cohort_id users seconds hours days
#> 1 navarra+73@hyfeapp.com Navarra 1 1678967 466.3797 19.43249
#> 2 navarra+12@hyfeapp.com Navarra 1 19391006 5386.3906 224.43294
#> years sounds coughs hourly_n hourly_rate hourly_var hourly_sd hourly_max
#> 1 0.05323969 37946 343 464 0.7897837 7.182032 2.679931 23.98667
#> 2 0.61488477 831366 7019 5402 1.2981923 15.008396 3.874067 65.00000
#> daily_n daily_rate daily_var daily_sd daily_max
#> 1 44 11.49898 257.4542 16.04538 74.29333
#> 2 255 30.97678 584.6859 24.18028 147.00000
In the users
slot of the output, you now you have a row summarizing each user. That user table is then used to build the overall
slot. The mean rates (i.e., hourly_rate
and daily_rate
) are the average of each user’s mean rates, and – importantly – the variability metrics (hourly_var
, hourly_sd
, daily_var
, daily_sd
) now pertain to the variability among users.
The hyfe_summarize()
function uses sample size cutoffs to ensure that rates are not swung to extremes due to insufficient monitoring. For example, an hour of day with 1 cough detection but only 1 minute of monitoring would produce an hourly cough rate estimate of 60 coughs per hour. Those scenarios should be avoided.
The default cutoffs are: at least 30 minutes of monitoring must occur within a hour-long window of the day in order for that hour to contribute to the estimation of the hourly cough rate; and at least 4 hours of monitoring must occur within a day in order for that day to count toward the daily cough rate.
You may adjust those defaults using the function arguments. For example, here is a much more stringent set of requirements, which may improve the accuracy of rate estimates but severely reduces sample size:
hyfe_summarize(ho_by_user,
cutoff_hourly = 59,
cutoff_daily = 23.9)
#> $overall
#> users seconds hours days years sounds coughs hourly_n hourly_rate
#> 1 2 21069973 5852.77 243.8654 0.6681245 869312 7362 2841.5 0.9205046
#> hourly_var hourly_sd hourly_max daily_n daily_rate daily_var daily_sd
#> 1 0.2615673 0.5114365 1.282145 81 26.19617 54.00036 7.348494
#> daily_max
#> 1 31.39234
#>
#> $users
#> uid name email
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 <NA> navarra+73@hyfeapp.com
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 <NA> navarra+12@hyfeapp.com
#> alias cohort_id users seconds hours days
#> 1 navarra+73@hyfeapp.com Navarra 1 1678967 466.3797 19.43249
#> 2 navarra+12@hyfeapp.com Navarra 1 19391006 5386.3906 224.43294
#> years sounds coughs hourly_n hourly_rate hourly_var hourly_sd hourly_max
#> 1 0.05323969 37946 343 408 0.5588644 3.550132 1.884179 19
#> 2 0.61488477 831366 7019 5275 1.2821448 14.806009 3.847858 65
#> daily_n daily_rate daily_var daily_sd daily_max
#> 1 1 21.00000 NA NA 21
#> 2 161 31.39234 600.7896 24.51101 147
Note that the cumulative counts are unaffected by these cutoffs, only the rates.
To get details and summaries about the distribution of cough rates in your data, use the function cough_rate_distribution()
.
This function can take both aggregated data (ho
) and user-separated (ho_by_user
), but it does best with the latter. It returns metrics about hourly cough rates based on an hour-by-hour analysis. Similar to the inputs in hyfe_summarize()
, the argument min_session
allows you to define the minimum amount of monitoring required during a single hour in order for that hour to be included in the cough rate estimation. For example, sometimes an hour of day contains only a few minutes of monitoring for a user; that makes for a pretty poor estimate of that hour’s cough rate. The default min_session
is 0.5 hours, or 30 minutes of monitoring within an hour.
The function returns a list with four slots:
The slot $overall
returns a one-row summary of the entire dataset:
cough_rates$overall
#> # A tibble: 1 × 7
#> mean_of_mean sd_of_mean mean_of_variance sd_of_variance n_hours_tot
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1.04 0.359 11.1 5.53 5866
#> # … with 2 more variables: n_hours_mean <dbl>, n_uid <int>
These metrics are based on the mean/variance for each individual user, i.e., mean_of_mean
is the average of mean cough rates across users. When using a hyfe
object prepared with by_user=TRUE
, this means that each user is weighted equally in the summary statistics. When using a hyfe
object in which all user data are aggregated together, users will be weighted according to their session time.
The slot $users
returns a summary for every user contained in the data:
cough_rates$users
#> # A tibble: 2 × 5
#> uid rate_mean rate_variance n_hours n_uid
#> <chr> <dbl> <dbl> <int> <int>
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 0.790 7.18 464 1
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 1.30 15.0 5402 1
The slot $rates
returns a numeric vector of hourly cough rates that satisfy the minimum monitoring threshold:
cough_rates$rates %>% head(100)
#> [1] 3.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000
#> [8] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [15] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [22] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [29] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.000000
#> [36] 2.756508 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [43] 0.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [50] 0.000000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000
#> [57] 0.000000 0.000000 4.572396 0.000000 0.000000 0.000000 0.000000
#> [64] 0.000000 0.000000 0.000000 3.835908 0.000000 0.000000 0.000000
#> [71] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [78] 0.000000 0.000000 0.000000 23.986674 0.000000 0.000000 0.000000
#> [85] 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000
#> [92] 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000
#> [99] 0.000000 0.000000
The slot $details
returns a dataframe with all details you might need to analyze these rates (essentially the hours
table from a hyfe
object):
cough_rates$details %>% head()
#> uid timestamp date_time tz
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611700741 2021-01-26 23:39:01 Europe/Madrid
#> 2 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611704341 2021-01-27 00:39:01 Europe/Madrid
#> 3 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611707941 2021-01-27 01:39:01 Europe/Madrid
#> 4 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611711541 2021-01-27 02:39:01 Europe/Madrid
#> 5 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611715141 2021-01-27 03:39:01 Europe/Madrid
#> 6 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 1611718741 2021-01-27 04:39:01 Europe/Madrid
#> date date_floor date_ceiling year week yday hour study_week study_day
#> 1 2021-01-26 1611615600 1611702000 2021 4 26 23 12 82
#> 2 2021-01-27 1611702000 1611788400 2021 4 27 0 12 82
#> 3 2021-01-27 1611702000 1611788400 2021 4 27 1 12 82
#> 4 2021-01-27 1611702000 1611788400 2021 4 27 2 12 82
#> 5 2021-01-27 1611702000 1611788400 2021 4 27 3 12 82
#> 6 2021-01-27 1611702000 1611788400 2021 4 27 4 12 82
#> study_hour frac_week frac_day frac_hour n_uid session_hours session_days
#> 1 1952 11.61905 81.33333 1952 1 1 0.04166667
#> 2 1953 11.62500 81.37500 1953 1 1 0.04166667
#> 3 1954 11.63095 81.41667 1954 1 1 0.04166667
#> 4 1955 11.63690 81.45833 1955 1 1 0.04166667
#> 5 1956 11.64286 81.50000 1956 1 1 0.04166667
#> 6 1957 11.64881 81.54167 1957 1 1 0.04166667
#> peaks coughs cough_rate
#> 1 10 3 3
#> 2 0 0 0
#> 3 0 0 0
#> 4 0 0 0
#> 5 0 0 0
#> 6 1 1 1
This function should make it straightforward to plot cough rate histograms…
ggplot(cough_rates$details,
aes(x=cough_rate)) +
geom_histogram() +
xlab('Coughs per hour') + ylab('Count') +
facet_wrap(~uid)
… or scatterplots of the relationship between cough rate mean and cough rate variance among users:
ggplot(cough_rates$users,
aes(x=rate_mean, y=rate_variance)) +
geom_point() +
xlab('Mean cough rate (coughs per hour)') + ylab('Variance in cough rate')
To generate a fake timeseries of coughs, use the function simulate_cougher()
.
This function return a dataframe with hourly cough counts based on the mean cough rate you provide:
By default, this function returns a month-long timeseries of coughs simulated using a negative binomial distribution in which the variance is predicted based on the mean using a regression Hyfe has developed using a 600-participant dataset from northern Spain. You can also specify your own variance using the rate_variance
argument. (There are many other arguments as well – see the function’s documentation for more).
For a visualize assessment of how much monitoring is needed in order to accurately estimate a user’s overall mean cough rate, use the function plot_cough_rate_error()
.
# Generate a cough time series
demo <- simulate_cougher(rate_mean = 3, random_seed = 124)
# Plot cough rate error
demo_error <- plot_cough_rate_error(demo)
For a cohort of users, you can use fit_model_to_cough()
to fit a model of the relationship between the mean and variance of cough rate in a cohort of users. Knowing the relationship between mean and variance in cough rate allows you to simulate realistic time series for any cough rate.
To do this, you would need a hyfe
object, processed with by_user = TRUE
, with many users. Here is what your code would look like:
See the function documentation to understand the details of these arguments and what is returned. This function returns various details in a list (p-values, R-squared, model objects, etc.). For now, note that you can feed the model coefficients directly into another function, predict_cough_variance()
, which returns a variance prediction based upon a user-specified cough rate and the model coefficients:
The cough_bouts()
function allows you to “pool” coughs that occur very close in time. This option can be useful in a medical context as well as in clinical validation experiments.
data(ho)
coughs <- ho$coughs
bouts <- cough_bouts(coughs,
bout_window = 2,
bout_limit = Inf,
verbose=FALSE)
In the code above, the inputs specify that coughs occurring within 2 seconds of each other should be pooled into a single cough bout, and that there is no limit to the number of coughs that can occur in a single bout.
Compare the number of coughs to the number of bouts:
Examine how many coughs were contained in these bouts:
Most bouts contain a single cough, but some contained two or three, and a few contained more than that.
Comparing Hyfe performance to a groundtruth, such as a set of labeled detections, requires that the two sets of events are synchronized. Even if Hyfe’s system time differs from the labeler’s clock by a second or two, that offset can complicate and confuse the performance evaluation process. Use the function synchronize()
to find the offset correction needed for Hyfe detections to be synchronized to a set of reference/label times.
Say you did a “field test” in which you coughed into a MP3 recorder and a Hyfe phone a few dozen times and you want to see how well Hyfe performed at detecting all of those coughs. A friend reviews the MP3 file and labels each sound according to Hyfe 4-tier labeling system. Your table of labels looks like this:
#> times labels
#> 1 1638636126 3
#> 2 1638636133 3
#> 3 1638636147 3
#> 4 1638636153 2
#> 5 1638636163 3
#> 6 1638636178 2
You download your Hyfe data and use the ho$sounds
slot to find the timestamp and prediction for the sounds in your test. A simplified table of your detections may look like this:
#> times predictions
#> 1 1638661299 TRUE
#> 2 1638661306 TRUE
#> 3 1638661320 TRUE
#> 4 1638661326 FALSE
#> 5 1638661336 TRUE
#> 6 1638661351 FALSE
Let’s synchronize these detections to your labels. (These are fake data that we generated for this example. The true time offset is 6 hours, 59 minutes, and 33 seconds ahead of the labels, or a total of 25173 seconds). The function should return the same offset:
synchronize(reference_times = reference$times,
reference_labels = reference$labels,
hyfe_times = detections$times,
hyfe_predictions = detections$predictions)
#> [1] -25173
The offset is negative because, when you add this number to the Hyfe detection timestamps, they will match the label timestamps.
hyfer
The function process_hyfe_data
relies on several background functions to do its thing. Those functions can also be called directly if you have need for them:
format_hyfe_time()
This function takes a set of timestamps and creates a dataframe of various date/time variables that will be useful in subsequent hyfer
functions.
format_hyfe_time(c(1626851363, 1626951363))
#> timestamp date_time tz date date_floor date_ceiling year
#> 1 1626851363 2021-07-21 07:09:23 UTC 2021-07-21 1626825600 1626912000 2021
#> 2 1626951363 2021-07-22 10:56:03 UTC 2021-07-22 1626912000 1626998400 2021
#> week yday hour study_week study_day study_hour frac_week frac_day frac_hour
#> 1 29 202 7 0 0 0 0.0000000 0.000000 0.00000
#> 2 29 203 10 1 2 28 0.1653439 1.157407 27.77778
When a timezone is not provided, as above, the function assumes that times are in UTC
. You can specify that explicitly if you wish:
format_hyfe_time(c(1626851363, 1626951363), 'UTC')
#> timestamp date_time tz date date_floor date_ceiling year
#> 1 1626851363 2021-07-21 07:09:23 UTC 2021-07-21 1626825600 1626912000 2021
#> 2 1626951363 2021-07-22 10:56:03 UTC 2021-07-22 1626912000 1626998400 2021
#> week yday hour study_week study_day study_hour frac_week frac_day frac_hour
#> 1 29 202 7 0 0 0 0.0000000 0.000000 0.00000
#> 2 29 203 10 1 2 28 0.1653439 1.157407 27.77778
This function is able to accept any timezone listed in the R
’s built-in collection of timezones (see OrsonNames()
).
format_hyfe_time(c(1626851363, 1626951363), 'Africa/Kampala')
#> timestamp date_time tz date date_floor
#> 1 1626851363 2021-07-21 10:09:23 Africa/Kampala 2021-07-21 1626814800
#> 2 1626951363 2021-07-22 13:56:03 Africa/Kampala 2021-07-22 1626901200
#> date_ceiling year week yday hour study_week study_day study_hour frac_week
#> 1 1626901200 2021 29 202 10 0 0 0 0.0000000
#> 2 1626987600 2021 29 203 13 1 2 28 0.1653439
#> frac_day frac_hour
#> 1 0.000000 0.00000
#> 2 1.157407 27.77778
format_hyfe_time(c(1626851363, 1626951363), 'America/Chicago')
#> timestamp date_time tz date date_floor
#> 1 1626851363 2021-07-21 02:09:23 America/Chicago 2021-07-21 1626843600
#> 2 1626951363 2021-07-22 05:56:03 America/Chicago 2021-07-22 1626930000
#> date_ceiling year week yday hour study_week study_day study_hour frac_week
#> 1 1626930000 2021 29 202 2 0 0 0 0.0000000
#> 2 1627016400 2021 29 203 5 1 2 28 0.1653439
#> frac_day frac_hour
#> 1 0.000000 0.00000
#> 2 1.157407 27.77778
Explanations of date/time variables:
timestamp
: Seconds since midnight UTC on January 1, 1970.date_time
: Attempt at converting timestamp to local time, using the timezone input. If not timezone is specified, time will be UTC.tz
: timezone specified.date
: Calendar date.date_floor
: Timestamp of 00:00:01 on the calendar date.date_ceiling
: Timestamp of 23:59:59 on the calendar date.year
: Year.week
: Week of the year.yday
: Day of the year.hour
: Hour of day (0 - 23).study_week
: Weeks since the beginning of the first monitoring session.study_day
: Days since the beginning of the first monitoring session.study_hour
: Hours since the beginning of the first monitoring session.frac_week
: Copy of study_week
, with fractional indication of how far into the week.frac_day
: Copy of study_day
, with fractional indication of how far into the day.frac_hour
: Copy of study_hour
, with fractional indication of how far into the hour.expand_sessions()
Most analyses of Hyfe data hinge upon detailed knowledge of when Hyfe was actively listening for coughs, and when it wasn’t. To determine the duration of monitoring on an hourly or daily basis, use the expand_sessions()
function.
This function returns a list with two slots: timetable
and series
. By default, series
is returned as a NULL
object since it is usually only needed for troubleshooting and can be time-consuming to prepare. The timetable
is a dataframe in which monitoring activity is detailed for each individual user in the dataset on an hourly or daily basis.
To create an hourly time table:
hyfe_time <- expand_sessions(hyfe_data,
unit='hour',
verbose=TRUE)
hyfe_time$timetable %>% nrow
#> [1] 13920
hyfe_time$timetable %>% head
#> timestamp date_time tz date date_floor
#> 1 1604617200 2020-11-06 00:00:00 Europe/Madrid 2020-11-06 1604617200
#> 2 1604620800 2020-11-06 01:00:00 Europe/Madrid 2020-11-06 1604617200
#> 3 1604624400 2020-11-06 02:00:00 Europe/Madrid 2020-11-06 1604617200
#> 4 1604628000 2020-11-06 03:00:00 Europe/Madrid 2020-11-06 1604617200
#> 5 1604631600 2020-11-06 04:00:00 Europe/Madrid 2020-11-06 1604617200
#> 6 1604635200 2020-11-06 05:00:00 Europe/Madrid 2020-11-06 1604617200
#> date_ceiling year week yday hour study_week study_day study_hour frac_week
#> 1 1604617200 2020 45 311 0 0 0 0 0.000000000
#> 2 1604703600 2020 45 311 1 1 1 1 0.005952381
#> 3 1604703600 2020 45 311 2 1 1 2 0.011904762
#> 4 1604703600 2020 45 311 3 1 1 3 0.017857143
#> 5 1604703600 2020 45 311 4 1 1 4 0.023809524
#> 6 1604703600 2020 45 311 5 1 1 5 0.029761905
#> frac_day frac_hour uid session_time
#> 1 0.00000000 0 9D7SChvklVa7zya0LdU6YVOi9QV2 0
#> 2 0.04166667 1 9D7SChvklVa7zya0LdU6YVOi9QV2 0
#> 3 0.08333333 2 9D7SChvklVa7zya0LdU6YVOi9QV2 0
#> 4 0.12500000 3 9D7SChvklVa7zya0LdU6YVOi9QV2 0
#> 5 0.16666667 4 9D7SChvklVa7zya0LdU6YVOi9QV2 0
#> 6 0.20833333 5 9D7SChvklVa7zya0LdU6YVOi9QV2 0
You can then, for example, summarize session activity for the two users in the sample dataset:
hyfe_time$timetable %>%
group_by(uid) %>%
summarize(hours_monitored = sum(session_time) / 3600,
days_monitored = sum(session_time) / 86400)
#> # A tibble: 2 × 3
#> uid hours_monitored days_monitored
#> <chr> <dbl> <dbl>
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 466. 19.4
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 5282. 220.
To create a daily time table:
hyfe_time <- expand_sessions(hyfe_data,
unit='day')
hyfe_time$timetable %>%
group_by(uid) %>%
summarize(hours_monitored = sum(session_time) / 3600,
days_monitored = sum(session_time) / 86400)
#> # A tibble: 2 × 3
#> uid hours_monitored days_monitored
#> <chr> <dbl> <dbl>
#> 1 5Ue2PKP6KMUUbQcVIIjWu8rglIU2 466. 19.4
#> 2 9D7SChvklVa7zya0LdU6YVOi9QV2 5283. 220.
Instead of a summary of session activity, the series
slot contains a continuous second-by-second time series:
hyfe_time <- expand_sessions(hyfe_data,
create_table = FALSE,
create_series = TRUE,
inactive_value = 0)
In this time series, every row is a second between the floor_date
and ceiling_date
of the study, and every column is a user (uid
). Seconds in which the user is active is represented with a “1
”. Inactive seconds are given the value of the input inactive_value
, the default for which is “0
”.
hyfe_time$series %>% head
#> timestamp 9D7SChvklVa7zya0LdU6YVOi9QV2 5Ue2PKP6KMUUbQcVIIjWu8rglIU2
#> 1 1604617200 0 0
#> 2 1604617201 0 0
#> 3 1604617202 0 0
#> 4 1604617203 0 0
#> 5 1604617204 0 0
#> 6 1604617205 0 0
Confirm that the same total monitoring duration, in days, was found using the series
approach.
hyfe_time$series %>% select(2,3) %>% apply(2,sum) / 86400
#> 9D7SChvklVa7zya0LdU6YVOi9QV2 5Ue2PKP6KMUUbQcVIIjWu8rglIU2
#> 220.07252 19.43249
Note that this series
feature is only useful in certain circumstances, and it can create enormous objects that slow everything down. However, it can be particularly valuable during troubleshooting if a phone seems to be acting up.
Tip: Changing the inactive inactive_value
to NA
may make it easier to plot session activity as lines on plots.
hyfe_time <- expand_sessions(hyfe_data,
create_table = FALSE,
create_series = TRUE,
inactive_value = NA)
# Setup plot
par(mar=c(4.2,4.2,.5,.5))
plot(1, type='n',
xlim=range(hyfe_time$series$timestamp),
ylim=c(0,3),
xlab='Timestamp',
ylab='User')
# Add user 1
lines(x = hyfe_time$series$timestamp,
y = hyfe_time$series[,2])
# Add user 2
lines(x = hyfe_time$series$timestamp,
y = hyfe_time$series[,3] + 1)
hyfe_timetables()
To create hourly/daily/weekly summaries of session activity, peak/cough detections, and cough rates, use the function hyfe_summary_tables()
.
This function is essentially a wrapper for expand_sessions()
, and calls both that function and format_hyfe_time()
. Note: This function lumps all users together.
This function returns a named list:
Hourly summary table:
hyfe_tables$hours %>% as.data.frame %>% head
#> timestamp date_time tz date date_floor
#> 1 1604617200 2020-11-06 00:00:00 Europe/Madrid 2020-11-06 1604617200
#> 2 1604620800 2020-11-06 01:00:00 Europe/Madrid 2020-11-06 1604617200
#> 3 1604624400 2020-11-06 02:00:00 Europe/Madrid 2020-11-06 1604617200
#> 4 1604628000 2020-11-06 03:00:00 Europe/Madrid 2020-11-06 1604617200
#> 5 1604631600 2020-11-06 04:00:00 Europe/Madrid 2020-11-06 1604617200
#> 6 1604635200 2020-11-06 05:00:00 Europe/Madrid 2020-11-06 1604617200
#> date_ceiling year week yday hour study_week study_day study_hour frac_week
#> 1 1604617200 2020 45 311 0 0 0 0 0.000000000
#> 2 1604703600 2020 45 311 1 1 1 1 0.005952381
#> 3 1604703600 2020 45 311 2 1 1 2 0.011904762
#> 4 1604703600 2020 45 311 3 1 1 3 0.017857143
#> 5 1604703600 2020 45 311 4 1 1 4 0.023809524
#> 6 1604703600 2020 45 311 5 1 1 5 0.029761905
#> frac_day frac_hour n_uid session_seconds session_hours session_days peaks
#> 1 0.00000000 0 0 0 0 0 0
#> 2 0.04166667 1 0 0 0 0 0
#> 3 0.08333333 2 0 0 0 0 0
#> 4 0.12500000 3 0 0 0 0 0
#> 5 0.16666667 4 0 0 0 0 0
#> 6 0.20833333 5 0 0 0 0 0
#> coughs cough_rate session_seconds_tot session_hours_tot session_days_tot
#> 1 0 NaN 0 0 0
#> 2 0 NaN 0 0 0
#> 3 0 NaN 0 0 0
#> 4 0 NaN 0 0 0
#> 5 0 NaN 0 0 0
#> 6 0 NaN 0 0 0
#> peaks_tot coughs_tot
#> 1 0 0
#> 2 0 0
#> 3 0 0
#> 4 0 0
#> 5 0 0
#> 6 0 0
Daily summary table
hyfe_tables$days %>% as.data.frame %>% head
#> date tz date_floor date_ceiling year week yday study_week
#> 1 2020-11-06 Europe/Madrid 1604617200 1604617200 2020 45 311 0
#> 2 2020-11-07 Europe/Madrid 1604703600 1604703600 2020 45 312 1
#> 3 2020-11-08 Europe/Madrid 1604790000 1604790000 2020 45 313 1
#> 4 2020-11-09 Europe/Madrid 1604876400 1604876400 2020 45 314 1
#> 5 2020-11-10 Europe/Madrid 1604962800 1604962800 2020 45 315 1
#> 6 2020-11-11 Europe/Madrid 1605049200 1605049200 2020 46 316 1
#> study_day n_uid session_seconds session_hours session_days peaks coughs
#> 1 0 1 30060 8.350000 0.3479167 24 24
#> 2 1 1 86244 23.956667 0.9981944 58 58
#> 3 2 1 86400 24.000000 1.0000000 140 140
#> 4 3 1 86400 24.000000 1.0000000 91 91
#> 5 4 1 34310 9.530556 0.3971065 23 23
#> 6 5 1 20285 5.634722 0.2347801 5 5
#> cough_rate session_seconds_tot session_hours_tot session_days_tot peaks_tot
#> 1 68.98204 30060 8.35000 0.3479167 24
#> 2 58.10491 116304 32.30667 1.3461111 82
#> 3 140.00000 202704 56.30667 2.3461111 222
#> 4 91.00000 289104 80.30667 3.3461111 313
#> 5 57.91897 323414 89.83722 3.7432176 336
#> 6 21.29652 343699 95.47194 3.9779977 341
#> coughs_tot
#> 1 24
#> 2 82
#> 3 222
#> 4 313
#> 5 336
#> 6 341
Weekly summary table:
hyfe_tables$weeks %>% as.data.frame %>% head
#> week tz date_floor date_ceiling year study_week n_uid
#> 1 45 Europe/Madrid 1604617200 1605049200 2020 0 1
#> 2 46 Europe/Madrid 1605049200 1605654000 2020 1 1
#> 3 47 Europe/Madrid 1605654000 1606258800 2020 2 1
#> 4 48 Europe/Madrid 1606258800 1606863600 2020 3 1
#> 5 49 Europe/Madrid 1606863600 1607468400 2020 4 1
#> 6 50 Europe/Madrid 1607468400 1608073200 2020 5 1
#> session_seconds session_hours session_days peaks coughs cough_rate
#> 1 323414 89.83722 3.743218 336 336 628.3364
#> 2 394832 109.67556 4.569815 269 228 349.2483
#> 3 389373 108.15917 4.506632 4313 369 573.1553
#> 4 322348 89.54111 3.730880 4072 140 262.6726
#> 5 476167 132.26861 5.511192 2576 306 388.6636
#> 6 604163 167.82306 6.992627 5618 331 331.3490
#> session_seconds_tot session_hours_tot session_days_tot peaks_tot coughs_tot
#> 1 323414 89.83722 3.743218 336 336
#> 2 718246 199.51278 8.313032 605 564
#> 3 1107619 307.67194 12.819664 4918 933
#> 4 1429967 397.21306 16.550544 8990 1073
#> 5 1906134 529.48167 22.061736 11566 1379
#> 6 2510297 697.30472 29.054363 17184 1710