Load truth data under multiple target variables from multiple truth sources

By default, for the US hub, the resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (HealthData) at all county, state and national level. For the ECDC hub, the default resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (ECDC) for all European countries. For the FluSight hub, the default resulting data.frame contains data for weekly incident hospitalization (HealthData) for all US locations.

load_truth(
  truth_source = NULL,
  target_variable = NULL,
  as_of = NULL,
  truth_end_date = NULL,
  temporal_resolution = NULL,
  locations = NULL,
  data_location = NULL,
  local_repo_path = NULL,
  hub = c("US", "ECDC", "FluSight")
)

Arguments

truth_source: character vector specifying where the truths will be loaded from: currently support "JHU", "NYTimes", "HealthData", "ECDC" and "OWID" If NULL, default for US hub is c("JHU", "HealthData"). If NULL, default for ECDC hub is c("OWID"). If NULL, default for FluSight hub is c("HealthData").
target_variable: string specifying target type It should be one or more of "cum death", "inc case", "inc death", "inc hosp". If NULL, default for US hub is c("inc case", "inc death", "inc hosp"). If NULL, default for ECDC hub is c("inc hosp"). If NULL, default for FluSight hub is c("inc flu hosp").
as_of: character vector of "as of" dates to use for querying truths in format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last available data with an issue date on or before the given as_of date are returned. This is only available for covidData now.
truth_end_date: date to include the last available truth point in 'yyyy-mm-dd' format. If NULL,default to system date.
temporal_resolution: character specifying temporal resolution to include: currently support "weekly" and "daily". If NULL, default to "weekly" for cases and deaths, "daily" for hospitalizations. Weekly temporal_resolution will not be applied to "inc hosp" and "inc flu hosp"when multiple target variables are specified. "ECDC" truth data is weekly by default. Daily level data is not available.
locations: a vector of strings of fips code or CBSA codes or location names, such as "Hampshire County, MA", "Alabama", "United Kingdom". A US county location names must include state abbreviation. Default to NULL which would include all locations with available forecasts.
data_location: character specifying the location of truth data. Currently only supports "local_hub_repo", "remote_hub_repo" and "covidData". If NULL, default to "remote_hub_repo".
local_repo_path: path to local clone of the hub repository. Only used when data_location is "local_hub_repo"
hub: character, which hub to use. Default is "US". Other options are "ECDC" and "FluSight".

Value

data.frame with columns model, target_variable, target_end_date, location, value, location_name, population and extra information in these cases

If hub = "US", it returns extra columns geo_type, geo_value, abbreviation and full_location_name.
If truth_source = "ECDC", this function returns extra columns week_start. However, when target_variable is only inc hosp, there are no extra columns appended to the resulting data frame.

Details

"inc hosp" is only available from "HealthData", "ECDC" and "OWID"."inc flu hosp" is only available from "HealthData".
This function is not loading data for other target variables from "HealthData".
When loading data for multiple target variables for the US hub, temporal_resolution will be applied to all target variables but "inc hosp" and "inc flu hosp". In that case, the function will return daily incident COVID hospitalization counts and weekly incident Influenza hospitalization.
For the US hub, weekly temporal resolution will be applied to "inc hosp" if the user specifies "inc hosp" as the only target_variable. On the other hand, temporal_resolution will be applied to "inc hosp" in all cases for the ECDC hub.
When aggregating daily data, if there are not enough observations for a week, the corresponding weekly count would be NA in resulting data frame.
as_of is only supported when data_location = "covidData". Otherwise, this function will return a warning.

Examples

library(covidHubUtils)

# load for US
load_truth(
  truth_source = c("JHU", "HealthData"),
  target_variable = c("inc case", "inc death", "inc hosp")
)
#> # A tibble: 1,113,686 × 11
#>    model target_variable target_end_date location value location_name population
#>    <chr> <chr>           <date>          <chr>    <dbl> <chr>              <dbl>
#>  1 Obse… inc hosp        2021-05-20      15           3 Hawaii           1415872
#>  2 Obse… inc hosp        2021-05-18      27          60 Minnesota        5639632
#>  3 Obse… inc hosp        2021-05-17      37          92 North Caroli…   10488084
#>  4 Obse… inc hosp        2021-05-17      44           6 Rhode Island     1059361
#>  5 Obse… inc hosp        2021-05-15      31           7 Nebraska         1934408
#>  6 Obse… inc hosp        2021-05-13      30           9 Montana          1068778
#>  7 Obse… inc hosp        2021-05-12      30          13 Montana          1068778
#>  8 Obse… inc hosp        2021-05-11      04          71 Arizona          7278717
#>  9 Obse… inc hosp        2021-05-10      23          16 Maine            1344212
#> 10 Obse… inc hosp        2021-05-10      44          10 Rhode Island     1059361
#> # ℹ 1,113,676 more rows
#> # ℹ 4 more variables: geo_type <chr>, geo_value <chr>, abbreviation <chr>,
#> #   full_location_name <chr>

# load for ECDC
load_truth(
  truth_source = c("JHU"),
  target_variable = c("inc case", "inc death"),
  hub = "ECDC"
)
#> # A tibble: 10,432 × 7
#>    model location target_end_date target_variable value location_name population
#>    <chr> <chr>    <date>          <chr>           <dbl> <chr>              <int>
#>  1 Obse… AT       2020-01-25      inc case            0 Austria          8809212
#>  2 Obse… AT       2020-02-01      inc case            0 Austria          8809212
#>  3 Obse… AT       2020-02-08      inc case            0 Austria          8809212
#>  4 Obse… AT       2020-02-15      inc case            0 Austria          8809212
#>  5 Obse… AT       2020-02-22      inc case            0 Austria          8809212
#>  6 Obse… AT       2020-02-29      inc case            3 Austria          8809212
#>  7 Obse… AT       2020-03-07      inc case           43 Austria          8809212
#>  8 Obse… AT       2020-03-14      inc case          363 Austria          8809212
#>  9 Obse… AT       2020-03-21      inc case         1989 Austria          8809212
#> 10 Obse… AT       2020-03-28      inc case         4895 Austria          8809212
#> # ℹ 10,422 more rows