R/load_truth.R
load_truth.Rd
By default, for the US hub, the resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (HealthData) at all county, state and national level. For the ECDC hub, the default resulting data.frame contains data for weekly incident cases (JHU), weekly incident deaths (JHU) and daily incident hospitalization (ECDC) for all European countries. For the FluSight hub, the default resulting data.frame contains data for weekly incident hospitalization (HealthData) for all US locations.
load_truth(
truth_source = NULL,
target_variable = NULL,
as_of = NULL,
truth_end_date = NULL,
temporal_resolution = NULL,
locations = NULL,
data_location = NULL,
local_repo_path = NULL,
hub = c("US", "ECDC", "FluSight")
)
character vector specifying where the truths will
be loaded from: currently support "JHU"
, "NYTimes"
, "HealthData
", "ECDC"
and "OWID"
If NULL
, default for US hub is c("JHU", "HealthData")
.
If NULL
, default for ECDC hub is c("OWID")
.
If NULL
, default for FluSight hub is c("HealthData")
.
string specifying target type It should be one or more of
"cum death"
, "inc case"
, "inc death"
, "inc hosp"
.
If NULL
, default for US hub is c("inc case", "inc death", "inc hosp")
.
If NULL
, default for ECDC hub is c("inc hosp")
.
If NULL
, default for FluSight hub is c("inc flu hosp")
.
character vector of "as of" dates to use for querying truths in
format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last
available data with an issue date on or before the given as_of
date are returned.
This is only available for covidData
now.
date to include the last available truth point in 'yyyy-mm-dd' format.
If NULL
,default to system date.
character specifying temporal resolution
to include: currently support "weekly"
and "daily"
.
If NULL
, default to "weekly"
for cases and deaths, "daily"
for hospitalizations.
Weekly temporal_resolution
will not be applied to "inc hosp"
and "inc flu hosp"
when
multiple target variables are specified.
"ECDC"
truth data is weekly by default. Daily level data is not available.
a vector of strings of fips code or CBSA codes or location names,
such as "Hampshire County, MA", "Alabama", "United Kingdom".
A US county location names must include state abbreviation.
Default to NULL
which would include all locations with available forecasts.
character specifying the location of truth data.
Currently only supports "local_hub_repo"
, "remote_hub_repo"
and "covidData"
.
If NULL
, default to "remote_hub_repo"
.
path to local clone of the hub repository.
Only used when data_location is "local_hub_repo"
character, which hub to use. Default is "US". Other options are "ECDC" and "FluSight".
data.frame with columns model
, target_variable
, target_end_date
,
location
, value
, location_name
, population
and extra information in these cases
If hub = "US"
, it returns extra columns geo_type
, geo_value
, abbreviation
and full_location_name
.
If truth_source = "ECDC"
, this function returns extra columns week_start
. However, when target_variable
is only
inc hosp
, there are no extra columns appended to the resulting data frame.
"inc hosp"
is only available from "HealthData"
, "ECDC"
and "OWID"
."inc flu hosp"
is only available from "HealthData"
.
This function is not loading data for other target variables from "HealthData"
.
When loading data for multiple target variables for the US hub, temporal_resolution
will be applied
to all target variables but "inc hosp"
and "inc flu hosp"
. In that case, the function will return
daily incident COVID hospitalization counts and weekly incident Influenza hospitalization.
For the US hub, weekly temporal resolution will be applied to "inc hosp"
if the user specifies "inc hosp"
as the only target_variable
. On the other hand, temporal_resolution
will
be applied to "inc hosp"
in all cases for the ECDC hub.
When aggregating daily data, if there are not enough observations for a week, the corresponding
weekly count would be NA
in resulting data frame.
as_of
is only supported when data_location = "covidData"
. Otherwise, this function
will return a warning.
library(covidHubUtils)
# load for US
load_truth(
truth_source = c("JHU", "HealthData"),
target_variable = c("inc case", "inc death", "inc hosp")
)
#> # A tibble: 1,113,686 × 11
#> model target_variable target_end_date location value location_name population
#> <chr> <chr> <date> <chr> <dbl> <chr> <dbl>
#> 1 Obse… inc hosp 2021-05-20 15 3 Hawaii 1415872
#> 2 Obse… inc hosp 2021-05-18 27 60 Minnesota 5639632
#> 3 Obse… inc hosp 2021-05-17 37 92 North Caroli… 10488084
#> 4 Obse… inc hosp 2021-05-17 44 6 Rhode Island 1059361
#> 5 Obse… inc hosp 2021-05-15 31 7 Nebraska 1934408
#> 6 Obse… inc hosp 2021-05-13 30 9 Montana 1068778
#> 7 Obse… inc hosp 2021-05-12 30 13 Montana 1068778
#> 8 Obse… inc hosp 2021-05-11 04 71 Arizona 7278717
#> 9 Obse… inc hosp 2021-05-10 23 16 Maine 1344212
#> 10 Obse… inc hosp 2021-05-10 44 10 Rhode Island 1059361
#> # ℹ 1,113,676 more rows
#> # ℹ 4 more variables: geo_type <chr>, geo_value <chr>, abbreviation <chr>,
#> # full_location_name <chr>
# load for ECDC
load_truth(
truth_source = c("JHU"),
target_variable = c("inc case", "inc death"),
hub = "ECDC"
)
#> # A tibble: 10,432 × 7
#> model location target_end_date target_variable value location_name population
#> <chr> <chr> <date> <chr> <dbl> <chr> <int>
#> 1 Obse… AT 2020-01-25 inc case 0 Austria 8809212
#> 2 Obse… AT 2020-02-01 inc case 0 Austria 8809212
#> 3 Obse… AT 2020-02-08 inc case 0 Austria 8809212
#> 4 Obse… AT 2020-02-15 inc case 0 Austria 8809212
#> 5 Obse… AT 2020-02-22 inc case 0 Austria 8809212
#> 6 Obse… AT 2020-02-29 inc case 3 Austria 8809212
#> 7 Obse… AT 2020-03-07 inc case 43 Austria 8809212
#> 8 Obse… AT 2020-03-14 inc case 363 Austria 8809212
#> 9 Obse… AT 2020-03-21 inc case 1989 Austria 8809212
#> 10 Obse… AT 2020-03-28 inc case 4895 Austria 8809212
#> # ℹ 10,422 more rows