| Title: | Tools for Analyzing IIDDA Datasets |
|---|---|
| Description: | This package contains tools for working with data obtained from the International Infectious Disease Data Archive. |
| Authors: | Steven Walker |
| Maintainer: | Steven Walker <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0.0 |
| Built: | 2026-05-15 06:20:33 UTC |
| Source: | https://github.com/canmod/iidda-tools |
Adds entries to user-defined lookup table
Entries should have names or columns from the user lookup table
standards can be used for entries
add_user_entries(entries, user_table_path)add_user_entries(entries, user_table_path)
entries |
dataframe or named list of entries to add |
user_table_path |
string indicating path to user lookup table |
user lookup table with added entries in the path
Basal Group
basal_group( value, lookup, hierarchical_variable, nesting_variable, encountered_values = character() )basal_group( value, lookup, hierarchical_variable, nesting_variable, encountered_values = character() )
value |
Value (e.g., |
lookup |
Table with two character-valued columns with names given by
|
hierarchical_variable |
Name of the hierarchical variable in |
nesting_variable |
Name of the nesting variable in |
encountered_values |
Character vector of values already found. Typically this left at the default value of an empty character vector. |
The root value that input value maps to in lookup.
Basal Group Adder
BasalGroupAdder(lookup)BasalGroupAdder(lookup)
lookup |
A lookup table with |
data |
A tidy data set with a |
tidy dataset with basal disease
Open a browser at the locations of the dependencies associated with a set of datasets.
browse_pipeline_dependencies( dataset_ids, dependency_types = c("IsCompiledBy", "IsDerivedFrom", "References"), metadata = iidda.api::ops_staging$metadata(dataset_ids = dataset_ids) )browse_pipeline_dependencies( dataset_ids, dependency_types = c("IsCompiledBy", "IsDerivedFrom", "References"), metadata = iidda.api::ops_staging$metadata(dataset_ids = dataset_ids) )
dataset_ids |
Character vector of dataset identifiers. |
dependency_types |
Vector of types of dependencies to browse. Possible
values include |
metadata |
Optional list giving dataset metadata. The default uses the IIDDA API, which requires the internet. |
Order Canadian Provinces Geographically
ca_iso_3166_2(data)ca_iso_3166_2(data)
data |
Dataset containing an |
Test if x is a Date, coerce if not
check_date(x)check_date(x)
x |
vector of putative dates |
vector with class Date, or error
d1 <- check_date("1920-01-01") d1 class(d1) # returns an error if x can't be coerced to Date easily # check_date("may 29th")d1 <- check_date("1920-01-01") d1 class(d1) # returns an error if x can't be coerced to Date easily # check_date("may 29th")
The important cleaning steps include (1) removing CA- from ISO-3166-2
codes (because within Canada this is redundant) and (2) filtering out
all time-scales but the 'best'.
so that there is no chance of double-counting cases.
clean_canmod_cdi(canmod_cdi, ...)clean_canmod_cdi(canmod_cdi, ...)
canmod_cdi |
Dataset from IIDDA of type |
... |
Arguments to pass on to |
Compute Moving Average of Time Series
ComputeMovingAverage(ma_window_length = 52)ComputeMovingAverage(ma_window_length = 52)
ma_window_length |
length of moving average window, this will depend on the time scale in the data. Defaults to 52, so that weekly data is averaged over years. |
A function like compute_moving_average_default to
remove to compute the moving average of a time series variable.
Create a function that aggregates count variables.
CountAggregator()CountAggregator()
A function like count_aggregator_default that
aggregates count variables.
Create age bin descriptions for joining age_group lookup table
create_bin_desc(age_df)create_bin_desc(age_df)
age_df |
data frame of data with age_group column |
data frame of data with bin_desc column
Convert a data_prep_function, into
a function that can be used in an iidda data prep pipeline.
CustomDataPrep( data_prep_function, ign_variables = character(0L), opt_variables = character(0L), new_variables = character(0L) )CustomDataPrep( data_prep_function, ign_variables = character(0L), opt_variables = character(0L), new_variables = character(0L) )
data_prep_function |
A standard |
description A data prep function that maintains
iidda attributes and guesses at names of types of variables that the
data_prep_function assumes.
Data Dictionary Converter
DataDictionaryConverter(data_dictionary = iidda_data_dictionary())DataDictionaryConverter(data_dictionary = iidda_data_dictionary())
Fixes heaping errors in time series. The structure of this function was taken from
the function find_heap_and_deheap created by Kevin Zhao
(https://github.com/davidearn/KevinZhao/blob/main/Report/make_SF_RData.R). This needs
to be better documented.
Deheaper( prefix = "deheaped_", first_date = "1830-01-01", last_date = "1841-12-31", week_start = 45, week_end = 5, deheaping_scale = 2.7 )Deheaper( prefix = "deheaped_", first_date = "1830-01-01", last_date = "1841-12-31", week_start = 45, week_end = 5, deheaping_scale = 2.7 )
first_date |
string containing earliest date to look for heaping errors |
last_date |
string containing last date to look for heaping errors |
week_start |
numeric value of the first week number to start looking for heaping errors |
week_end |
numeric value of the last week number to look for heaping errors |
A function like deheaper_default to fix heaping errors.
Factor Time Scale
factor_time_scale(data)factor_time_scale(data)
data |
A tidy data set with a |
A data set with a factored time_scale column.
Make new records for instances when the sum of leaf diseases is less than the reported total for their basal disease. The difference between these counts gets disease name 'basal_disease'_unaccounted'.
find_unaccounted_cases(data)find_unaccounted_cases(data)
data |
A tidy data set with a |
A data set containing records that are the difference between a reported total for a basal_disease and the sum of their leaf diseases.
Data for a Particular Disease
generate_disease_df(canmod_cdi, disease_name, years = NULL, add_gaps = TRUE)generate_disease_df(canmod_cdi, disease_name, years = NULL, add_gaps = TRUE)
canmod_cdi |
Dataset from IIDDA of type |
disease_name |
Name to match in the |
years |
If not |
add_gaps |
If |
Creates an empty table in a specified directory using columns names from another data frame
generate_empty_df(dir_path, lookup_table, csv_name)generate_empty_df(dir_path, lookup_table, csv_name)
dir_path |
string indicating path to directory |
lookup_table |
data frame with column names to include in table |
csv_name |
string indicating name of the created .csv file |
empty csv file with columns from lookup_table in the directory if successfully generated
Creates an empty user-defined lookup table in a specified directory
generate_user_table(path, lookup_table_type)generate_user_table(path, lookup_table_type)
path |
string indicating path to directory |
lookup_table_type |
string indicating type of lookup table |
csv file of empty lookup table with columns from lookup_table_type in the directory if successful
Get IIDDA Attribute
get_iidda_attr(data, which)get_iidda_attr(data, which)
data |
Data frame that contains a list attribute called |
which |
Name of the element in the |
Value of the element given by which in the "iidda" attribute.
Add zeros to data set that are implied by a '0' reported at a coarser timescale.
get_implied_zeros(data)get_implied_zeros(data)
data |
A tidy data set with the following minimal set of columns:
|
A tidy data set with inferred 0s.
Get label of associated time unit
get_unit_labels(unit)get_unit_labels(unit)
unit |
time unit, one of iidda.analysis:::time_units |
label of associated time unit
Wrapper of seq.Date() and lubridate::floor_date
grid_dates( start_date = "1920-01-01", end_date = "2020-01-01", by = "1 week", unit = "week", lookback = TRUE, week_start = 7 )grid_dates( start_date = "1920-01-01", end_date = "2020-01-01", by = "1 week", unit = "week", lookback = TRUE, week_start = 7 )
start_date |
starting date |
end_date |
end date |
by |
increment of the sequence. Optional. See ‘Details’. |
unit |
a string, When When |
lookback |
Logical, should the first value start before |
week_start |
week start day (Default is 7, Sunday. Set |
vector of Dates at the first of each week, month, year
grid_dates(start_date = "2023-04-01" , end_date = "2023-05-16") grid_dates(start_date = "2023-04-01" , end_date = "2023-05-16" , lookback = FALSE) grid_dates(start_date = "2020-04-01" , end_date = "2023-05-16" , by = "2 months" , unit = "month") grid_dates(start_date = "2020-04-01" , end_date = "2023-05-16" , by = "2 months")grid_dates(start_date = "2023-04-01" , end_date = "2023-05-16") grid_dates(start_date = "2023-04-01" , end_date = "2023-05-16" , lookback = FALSE) grid_dates(start_date = "2020-04-01" , end_date = "2023-05-16" , by = "2 months" , unit = "month") grid_dates(start_date = "2020-04-01" , end_date = "2023-05-16" , by = "2 months")
Data Prep Default Functions
handle_missing_values_default(data, series_variable = NULL) handle_zero_values_default(data, series_variable = NULL) trim_series_default(data, series_variable = NULL, time_variable = NULL) series_harmonizer_default(data, series_variable = NULL, time_variable = NULL) deheaper_default(data, series_variable = NULL, time_variable = NULL) compute_moving_average_default( data, series_variable = NULL, time_variable = NULL ) period_aggregator_default( data, time_variable = NULL, period_width_variable = NULL, count_variable = NULL, norm_variable = NULL ) count_aggregator_default( data, total_count_variable = NULL, count_variable = NULL, grouping_variable = NULL ) period_describer_default( data, period_start_variable = NULL, period_end_variable = NULL, period_mid_time_variable = NULL, period_mid_date_variable = NULL, period_days_variable = NULL )handle_missing_values_default(data, series_variable = NULL) handle_zero_values_default(data, series_variable = NULL) trim_series_default(data, series_variable = NULL, time_variable = NULL) series_harmonizer_default(data, series_variable = NULL, time_variable = NULL) deheaper_default(data, series_variable = NULL, time_variable = NULL) compute_moving_average_default( data, series_variable = NULL, time_variable = NULL ) period_aggregator_default( data, time_variable = NULL, period_width_variable = NULL, count_variable = NULL, norm_variable = NULL ) count_aggregator_default( data, total_count_variable = NULL, count_variable = NULL, grouping_variable = NULL ) period_describer_default( data, period_start_variable = NULL, period_end_variable = NULL, period_mid_time_variable = NULL, period_mid_date_variable = NULL, period_days_variable = NULL )
data |
Data frame that likely comes from IIDDA. |
series_variable |
Name of variable that can be used on the y-axis of a time-series. |
time_variable |
Name of variable that characterizes the temporal location of the time period. |
period_width_variable |
Name of variable that characterizes the width of the time period. |
count_variable |
Name of variable containing a count (e.g., cases, births, deaths, population). |
norm_variable |
Name of variable that can be used to normalize another variable (e.g., population normalized cases). |
total_count_variable |
Name of variable containing a marginal total of a set of counts. |
grouping_variable |
Name of variable containing |
period_start_variable |
Name of variable containing |
period_end_variable |
Name of variable containing |
period_mid_time_variable |
Name of variable containing |
period_mid_date_variable |
Name of variable containing |
period_days_variable |
Name of variable containing |
cases_variable |
Name of variable containing unstandardized reported incidence. |
population_variable |
Name of variable containing population numbers. |
birth_variable |
Name of variable containing numbers of births. |
death_variable |
Name of variable containing numbers of deaths. |
median_cases_variable |
Name of variable containing the median of a set of count variables. |
period_mid_variable |
Name of variable containing |
date_variable |
Name of variable containing |
integer_time_variable |
Name of variable containing |
numeric_time_variable |
Name of variable containing |
time_scale_variable |
Name of variable containing |
time_group_variable |
Name of variable containing |
time_grouping_variable |
Name of variable containing |
disease_variable |
Name of variable containing |
hierarchical_variable |
Name of variable containing |
nesting_variable |
Name of variable containing |
basal_variable |
Name of variable containing |
title_variable |
Name of variable containing |
among_panel_variable |
Name of variable containing |
within_panel_variable |
Name of variable containing |
colour_variable |
Name of variable containing |
category_variable |
Name of variable containing |
categorical_variable |
Name of variable containing |
Construct a function that takes a data frame and returns another data
frame with NA values either removed or replaced.
HandleMissingValues(na_remove = FALSE, na_replace = NULL)HandleMissingValues(na_remove = FALSE, na_replace = NULL)
na_remove |
boolean value, if |
na_replace |
numeric value to replace |
A function like handle_missing_values_default that
removes or replaces missing values.
Construct a function that takes a data frame and returns another data
frame with 0 values either removed or replaced.
HandleZeroValues(zero_remove = FALSE, zero_replace = NULL)HandleZeroValues(zero_remove = FALSE, zero_replace = NULL)
zero_remove |
boolean value, if |
zero_replace |
numeric value to replace zeroes in series variable, if NULL no replacement is performed |
A function like handle_zero_values_default to remove
or replace zero values.
Attach Bar Plot to Dataset
iidda_attach_bar( data, initial_ggplot_object = ggplot(), series_variable = NULL, time_unit = NULL, aggregated = NULL )iidda_attach_bar( data, initial_ggplot_object = ggplot(), series_variable = NULL, time_unit = NULL, aggregated = NULL )
data |
Data frame, probably containing an IIDDA dataset. |
initial_ggplot_object |
Plot object that will be used to add a bar geom. |
Get starting time period, ending time period and mortality cause name from the data set for use in axis and main plot titles.
iidda_get_metadata(data, time_variable = NULL, descriptor_variable = NULL)iidda_get_metadata(data, time_variable = NULL, descriptor_variable = NULL)
data |
data frame containing time series data |
time_variable |
column name of time variable in |
descriptor_variable |
column name of the descriptor variable in |
a list in order containing minimum time period, maximum time period and cause name.
Add a bar plot to an exiting ggplot plot object. Graphical choices were made to closely reflect
plots generated with LBoM::monthly_bar_graph and LBoM::weekly_bar_graph.
iidda_plot_bar( plot_object, data = NULL, series_variable = NULL, time_unit = "month_factor_abbr" )iidda_plot_bar( plot_object, data = NULL, series_variable = NULL, time_unit = "month_factor_abbr" )
plot_object |
a |
data |
data frame containing data prepped for bar plotting, typically output from |
series_variable |
column name of series variable in |
time_unit |
time unit to display bar graphs on the x-axis. Defaults to "week" or one of iidda.analysis:::time_units that starts with "month". Should generalize at some point to be able to take any time_unit argument. |
a ggplot2 plot object containing a bar graphs of time series data
Add a box plot to an exiting ggplot plot object. Graphical choices were made to closely reflect
plots generated with LBoM::monthly_box_plot.
iidda_plot_box( plot_object, data = NULL, series_variable = "deaths", time_unit = "week", ... )iidda_plot_box( plot_object, data = NULL, series_variable = "deaths", time_unit = "week", ... )
plot_object |
a |
data |
data frame containing data prepped for box plotting, typically output from |
series_variable |
column name of series variable in |
time_unit |
time unit to display box plots on the x-axis. Defaults to "week", should be able to handle any time_unit from iidda.analysis:::time_units. |
... |
other arguments to be passed to |
a ggplot2 plot object containing a box plots of time series data
Add a yearly vs. weekly heatmap to an exiting ggplot plot object. Graphical choices were made to closely reflect
plots generated withLBoM::seasonal_heat_map.
iidda_plot_heatmap( plot_object, data = NULL, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", colour_trans = "log2", NA_colour = "black", palette_colour = "RdGy", ... )iidda_plot_heatmap( plot_object, data = NULL, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", colour_trans = "log2", NA_colour = "black", palette_colour = "RdGy", ... )
plot_object |
a |
data |
data frame containing data prepped for yearly vs. weekly heatmaps, typically output from
|
series_variable |
column name of series variable in |
start_year_variable |
column name of time variable containing the year of the starting period, defaults to "Year" |
end_year_variable |
column name of time variable containing the year of the ending period, defaults to "End Year" |
start_day_variable |
column name of time variable containing the day of the starting period, defaults to "Day of Year" |
end_day_variable |
column name of time variable containing the day of the ending period, defaults to "End Day of Year" |
colour_trans |
string indicating colour transformation, one of "log2", "sqrt" or "linear" |
NA_colour |
colour for |
palette_colour |
colour of heatmap palette, defaults to "RdGy". Should specify what type of palette colours are accepted by this argument. |
... |
Not currently used. |
a ggplot2 plot object containing a yearly vs. weekly heatmap of time series data
Add a rectangular highlighted region to an existing ggplot2 plot object
iidda_plot_highlight( plot_object, data = NULL, series_variable = "deaths", time_variable = "period_end_date", filter_variable = "period_end_date", filter_start = "1700-01-01", filter_end = "1800-01-01", ... )iidda_plot_highlight( plot_object, data = NULL, series_variable = "deaths", time_variable = "period_end_date", filter_variable = "period_end_date", filter_start = "1700-01-01", filter_end = "1800-01-01", ... )
plot_object |
a |
data |
data frame containing time series data. If |
series_variable |
column name of series variable in |
time_variable |
column name of time variable in |
filter_variable |
column name of variable to filter on in |
filter_start |
value of |
filter_end |
value of |
... |
other arguments to be passed to |
a ggplot2 plot object a rectangular plot highlight
Add a moving average time series line to an exiting ggplot plot object. Graphical choices were made
to closely reflect plots generated with LBoM::plot.LBoM.
iidda_plot_ma( plot_object, data = NULL, series_variable = NULL, time_variable = NULL )iidda_plot_ma( plot_object, data = NULL, series_variable = NULL, time_variable = NULL )
plot_object |
a |
data |
data frame containing moving average time series data, typically output from |
series_variable |
column name of series variable in |
time_variable |
column name of time variable in |
a ggplot2 plot object containing a moving average time series
Add a rohani heatmap to an exiting ggplot plot object. Possibly to be extended to include time series in a separate facet.
iidda_plot_rohani_heatmap( plot_object, data = NULL, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", grouping_variable = "cause", colour_trans = log1p_modified_trans(), n_colours = (scales::brewer_pal(palette = "YlOrRd"))(9), NA_colour = "black", palette_colour = "YlOrRd" )iidda_plot_rohani_heatmap( plot_object, data = NULL, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", grouping_variable = "cause", colour_trans = log1p_modified_trans(), n_colours = (scales::brewer_pal(palette = "YlOrRd"))(9), NA_colour = "black", palette_colour = "YlOrRd" )
plot_object |
a |
data |
data frame containing data prepped for yearly vs. weekly heatmaps, typically output from
|
series_variable |
column name of series variable in |
start_year_variable |
column name of time variable containing the year of the starting period, defaults to "Year" |
end_year_variable |
column name of time variable containing the year of the ending period, defaults to "End Year" |
start_day_variable |
column name of time variable containing the day of the starting period, defaults to "Day of Year" |
end_day_variable |
column name of time variable containing the day of the ending period, defaults to "End Day of Year" |
grouping_variable |
column name of grouping variable to appear on the y-axis of the heatmap. |
colour_trans |
function to scale colours, to be supplied to trans argument of scale_fill_gradientn() |
n_colours |
vector of colours to be supplied to scale_fill_gradientn() |
NA_colour |
colour for |
palette_colour |
colour of heatmap palette, defaults to "RdGy". Should specify what type of palette colours are accepted by this argument. |
a ggplot2 plot object containing a yearly vs. weekly heatmap of time series data
Add basic features to a ggplot2 plot object including title, subtitle and classic ggplot2::theme_bw theme.
iidda_plot_settings( plot_object, data = data.frame(), min_time = "min_time", max_time = "max_time", descriptor_name = "descriptor_name", theme = iidda_theme )iidda_plot_settings( plot_object, data = data.frame(), min_time = "min_time", max_time = "max_time", descriptor_name = "descriptor_name", theme = iidda_theme )
plot_object |
a |
data |
list containing metadata. If |
min_time |
name of field in data containing the minimum time period range, defaults to "min_time". |
max_time |
name of field in data containing the minimum time period range, defaults to "max_time". |
descriptor_name |
either the name of a field in data containing the descriptor or a string to be used as the
plot title. If there are too more than 3 elements in the descriptor field, then |
theme |
ggplot theme |
a ggplot2 plot object with title, subtitle and adjusted theme.
Plot wavelet to look similar to base R plot of WaveletComp::wt.image using ggplot2 functionality.
Some visual choices were made to reflect work done by Steven Lee (https://github.com/davidearn/StevenLee)
and Kevin Zhao (https://github.com/davidearn/KevinZhao).
iidda_plot_wavelet( plot_object, data = NULL, wavelet_data, contour_data, y_variable_name = "Period (years)", fill_variable_name = "Power", max_period = 10, colour_levels = 250, start_hue = 0, end_hue = 0.7, sig_lvl = 0.05 )iidda_plot_wavelet( plot_object, data = NULL, wavelet_data, contour_data, y_variable_name = "Period (years)", fill_variable_name = "Power", max_period = 10, colour_levels = 250, start_hue = 0, end_hue = 0.7, sig_lvl = 0.05 )
plot_object |
a |
data |
data frame containing wavelet data prepped for use in |
wavelet_data |
list containing raw wavelet transformed data, typically output from |
contour_data |
data set containing contour data prepped for use in |
y_variable_name |
name of y variable in plot, defaults to "Period (years)". |
fill_variable_name |
name of colour fill variable in plot, defaults to "Power". |
max_period |
maximum period to appear on the plot, defaults to 10 years. |
colour_levels |
number of colours to pass to |
start_hue |
starting hue colour to pass to |
end_hue |
ending hue colour to pass to |
sig_lvl |
significance level for white contours |
a ggplot2 object of a wavelet
Prep data for plotting bar graphs. Prep steps were taken from LBoM::monthly_bar_graph and LBoM::weekly_bar_graph
and they include handling missing values and aggregating series data by time unit grouping variable.
iidda_prep_bar( data, series_variable = NULL, time_variable = NULL, time_unit = NULL, handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL) )iidda_prep_bar( data, series_variable = NULL, time_variable = NULL, time_unit = NULL, handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL) )
data |
data frame containing time series data |
series_variable |
column name of series variable in |
time_variable |
column name of time variable in |
time_unit |
time unit to sum series data over, must be one of iidda.analysis:::time_units, defaults to "week". |
handle_missing_values |
function to handle missing values, defaults to HandleMissingValues |
handle_zero_values |
function to handle zero values, defaults to HandleZeroValues |
data with records prepped for plotting bar graphs with series_variable and time_unit field. The name
of the resulting time_unit field will be named from lubridate_funcs.
Prep data for plotting box plots. Prep steps were taken from LBoM::monthly_box_plot
and they include handling missing values and creating additional time unit fields.
iidda_prep_box( data, series_variable = NULL, time_variable = NULL, time_unit = "month_factor_abbr", handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL) )iidda_prep_box( data, series_variable = NULL, time_variable = NULL, time_unit = "month_factor_abbr", handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL) )
data |
data frame containing time series data |
series_variable |
column name of series variable in |
time_variable |
column name of time variable in |
time_unit |
time unit to create field from |
handle_missing_values |
function to handle missing values, defaults to HandleMissingValues |
handle_zero_values |
function to handle zero values, defaults to HandleZeroValues |
all fields indata with records prepped for plotting box plots. The name
of the new time_unit field will be named from lubridate_funcs.
Prep data for rohani plots. Prep steps include creating additional time unit fields, summarizing the series variable by time unit and grouping variable (the x and y axis variables) ,and optionally normalizing series data to be in the range (0,1). By default, the grouping variable is ranked in order of the summarized series variable. Needs to be generalized more, might need to handle the case where the desired y-axis is a second time unit, as in the seasonal heatmap plot and therefore making use of the year_end_fix function.
iidda_prep_rohani( data, series_variable = NULL, time_variable = "period_end_date", start_time_variable = "period_start_date", time_unit = "year", grouping_variable = "cause", ranking_variable = NULL, normalize = FALSE, handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), create_nonexistent = FALSE )iidda_prep_rohani( data, series_variable = NULL, time_variable = "period_end_date", start_time_variable = "period_start_date", time_unit = "year", grouping_variable = "cause", ranking_variable = NULL, normalize = FALSE, handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), create_nonexistent = FALSE )
data |
data frame containing time series data |
series_variable |
column name of series variable in |
time_variable |
column name of time variable in |
start_time_variable |
column name of time variable in |
time_unit |
a vector of new time unit fields to create from |
grouping_variable |
column name of grouping variable to appear on the y-axis of the heatmap. |
ranking_variable |
column name of variable used to rank the grouping variable. |
normalize |
boolean flag to normalize |
handle_missing_values |
function to handle missing values, defaults to HandleMissingValues |
handle_zero_values |
function to handle zero values, defaults to HandleZeroValues |
create_nonexistent |
boolean flag to create |
all fields indata with records prepped for plotting rohani heatmaps. The name
of the new time_unit fields will be named from lubridate_funcs.
Prep data for seasonal heatmap plots. Prep steps were taken from LBoM::seasonal_heat_map
and they include creating additional time unit fields, splitting weeks that cover the
year end, and optionally normalizing series data to be in the range (0,1).
iidda_prep_seasonal_heatmap( data, series_variable = NULL, start_time_variable = "period_start_date", end_time_variable = "period_end_date", time_unit = c("yday", "year"), prepend_string = "End ", normalize = FALSE, ... )iidda_prep_seasonal_heatmap( data, series_variable = NULL, start_time_variable = "period_start_date", end_time_variable = "period_end_date", time_unit = c("yday", "year"), prepend_string = "End ", normalize = FALSE, ... )
data |
data frame containing time series data |
series_variable |
column name of series variable in |
start_time_variable |
column name of time variable in |
end_time_variable |
column name of time variable in |
time_unit |
a vector of new time unit fields to create from |
prepend_string |
string to prepend to newly created time_unit fields to distinguish between time_unit
fields corresponding to starting versus ending time periods. Defaults to "End ". For example, a |
normalize |
boolean flag to normalize |
... |
optional arguments to |
all fields indata with records prepped for plotting seasonal heatmaps. The name
of the new time_unit fields will be named from lubridate_funcs.
Extracts a plot object attached to a data set, applies a title, subtitle, and theme, and returns the plot.
iidda_render_plot( data, title = TitleGuesser(), subtitle = TimeRangeDescriber(), theme = iidda_theme )iidda_render_plot( data, title = TitleGuesser(), subtitle = TimeRangeDescriber(), theme = iidda_theme )
data |
A data frame with an attached plot object. |
title |
A plot title, or a function that takes |
subtitle |
A plot subtitle, or a function that takes |
theme |
A function that returns a |
Prepare a data set for plotting (trimming, handling missing/zero values, converting the time variable), build a line chart, and apply title, subtitle, and theme.
iidda_series( data, series_variable = NULL, time_variable = NULL, trim_series = TrimSeries(zero_lead = FALSE, zero_trail = FALSE), handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), time_variable_converter = TimeVariableConverter(), title = TitleGuesser(), subtitle = TimeRangeDescriber(), theme = iidda_theme ) iidda_attach_series( data, initial_ggplot_object = ggplot(), series_variable = NULL, time_variable = NULL ) iidda_prep_ma( data, series_variable = NULL, time_variable = NULL, trim_series = TrimSeries(zero_lead = FALSE, zero_trail = FALSE), handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), compute_moving_average = ComputeMovingAverage(ma_window_length = 52), time_variable_converter = TimeVariableConverter() ) iidda_prep_series( data, series_variable = NULL, time_variable = NULL, trim_series = TrimSeries(zero_lead = FALSE, zero_trail = FALSE), handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), time_variable_converter = TimeVariableConverter() )iidda_series( data, series_variable = NULL, time_variable = NULL, trim_series = TrimSeries(zero_lead = FALSE, zero_trail = FALSE), handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), time_variable_converter = TimeVariableConverter(), title = TitleGuesser(), subtitle = TimeRangeDescriber(), theme = iidda_theme ) iidda_attach_series( data, initial_ggplot_object = ggplot(), series_variable = NULL, time_variable = NULL ) iidda_prep_ma( data, series_variable = NULL, time_variable = NULL, trim_series = TrimSeries(zero_lead = FALSE, zero_trail = FALSE), handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), compute_moving_average = ComputeMovingAverage(ma_window_length = 52), time_variable_converter = TimeVariableConverter() ) iidda_prep_series( data, series_variable = NULL, time_variable = NULL, trim_series = TrimSeries(zero_lead = FALSE, zero_trail = FALSE), handle_missing_values = HandleMissingValues(na_remove = FALSE, na_replace = NULL), handle_zero_values = HandleZeroValues(zero_remove = FALSE, zero_replace = NULL), time_variable_converter = TimeVariableConverter() )
data |
A data frame containing the time-series data (typically output from |
series_variable |
Name of the series column in |
time_variable |
Name of the time column in |
trim_series |
A |
handle_missing_values |
A |
handle_zero_values |
A |
time_variable_converter |
A |
title |
A plot title, or a function that takes |
subtitle |
A plot subtitle, or a function that takes |
theme |
A function that returns a |
compute_moving_average |
function to compute the moving average of |
A ggplot2 plot object containing the time-series line chart with title, subtitle, and theme applied.
iidda_attach_series(): Attach a time-series plot to a data frame
containing the plotted data.
iidda_prep_ma(): Prepare a dataset so that it can be used to
produce a time-series plot of a moving average.
iidda_prep_series(): Prepare a dataset so that it can be used to
produce a time-series plot.
Themes for ggplot2
iidda_theme() iidda_theme_time() iidda_theme_heat() iidda_theme_above()iidda_theme() iidda_theme_time() iidda_theme_heat() iidda_theme_above()
iidda_theme_time(): Theme for plots where the x-axis represents time.
No x-axis titles will be plotted with this theme, because the meaning of a
time axis is obvious.
iidda_theme_heat(): Theme for heatmaps where the x-axis represents time.
No x-axis titles will be plotted with this theme, because the
meaning of a time axis is obvious. Grid lines are not plotted with this
theme because interpretation can be compromised when grid lines
are visible through the colours of the heatmap.
iidda_theme_above(): Theme for plots where the x-axis represents time,
but for which time information is not displayed because there are vertically
aligned plots below with the same time axis.
Joins lookup table in API to data
join_lookup_table(raw_data, lookup_type, api_hook)join_lookup_table(raw_data, lookup_type, api_hook)
raw_data |
data frame of table to be harmonized |
lookup_type |
string indicating type of lookup table from API to join |
api_hook |
API operations list |
data frame of harmonized data with keys from API
Joins user-defined lookup table to data
join_user_table(raw_data, user_table_path, lookup_type, join_by)join_user_table(raw_data, user_table_path, lookup_type, join_by)
raw_data |
data frame of table to be harmonized |
user_table_path |
string indicating path to user-defined lookup table |
lookup_type |
string indicating type of lookup table (disease, location, sex). Used to determine columns to join by if |
join_by |
vector of strings indicating columns to join by (optional if |
data frame of harmonized data with user-defined keys
Slight modification of log1p_trans() to include better breaks that are log1p-based (log-based and
shifted 1 so that breaks can be computed in the presence of zeroes.)
log1p_modified_trans(n = 10)log1p_modified_trans(n = 10)
n |
number of desired breaks |
a scales::trans_new function
Left joins lookup table to data frame of data.
lookup_join(raw_data, lookup_table, join_by, verbose = FALSE)lookup_join(raw_data, lookup_table, join_by, verbose = FALSE)
raw_data |
Data frame of data to be harmonized. |
lookup_table |
Data frame of lookup table. |
join_by |
Vector of strings indicating columns to left_join by
(can use |
verbose |
Print information about the lookup. |
Data frame of newly harmonized and resolved data. Note that all entries in the returned data frame are strings.
lubridate functions with desired interpretable labels
lubridate_funcslubridate_funcs
An object of class character of length 10.
Get associated lubridate function to compute time unit.
make_time_trans(unit = unname(time_units))make_time_trans(unit = unname(time_units))
unit |
time unit, one or more of iidda.analysis:::time_units |
function to compute time unit
Compute a vector giving the mid-points of a vector of temporal periods,
defined by start dates and one of either a vector of end dates or a vector
of period lengths in days (see num_days). You can either
return a date, with mid_dates, or a date-time, with mid_times.
In addition to the type of return value (date vs time), the former rounds
down to the nearest date whereas the latter is accurate to the nearest hour
and so can account for uneven
mid_dates(start_date, end_date, period_length) mid_times(start_date, end_date, period_length)mid_dates(start_date, end_date, period_length) mid_times(start_date, end_date, period_length)
start_date |
Vector of period starting dates |
end_date |
Vector of period ending dates. If missing then
|
period_length |
Vector of integers giving the period length in days.
If missing then it is calculated using |
Create new time unit fields
mutate_time_vars( data, unit = unname(time_units), input_nm = "period_end_date", output_nm = get_unit_labels(unit) )mutate_time_vars( data, unit = unname(time_units), input_nm = "period_end_date", output_nm = get_unit_labels(unit) )
data |
data set containing an input time field |
unit |
time unit, one of iidda.analysis:::time_units |
input_nm |
field name in |
output_nm |
field name of newly created time unit field, by default uses get_unit_labels(). |
all fields in data with additional time unit field
Defines column names to join by for a type of lookup table
names_to_join_by(lookup_type)names_to_join_by(lookup_type)
lookup_type |
string indicating type of lookup table (disease, location, sex, age group) |
vector of column names to join by for the type of lookup table
Take a tidy data set with a potentially complex disease hierarchy
and flatten this hierarchy so that, at any particular time and location
(or some other context), all diseases in the disease column have the
same nesting_disease.
normalize_disease_hierarchy( data, disease_lookup, grouping_columns = c("period_start_date", "period_end_date", "location"), basal_diseases_to_prune = character(), find_unaccounted_cases = TRUE, specials_pattern = "_unaccounted$" )normalize_disease_hierarchy( data, disease_lookup, grouping_columns = c("period_start_date", "period_end_date", "location"), basal_diseases_to_prune = character(), find_unaccounted_cases = TRUE, specials_pattern = "_unaccounted$" )
data |
A tidy data set with the following minimal set of columns:
|
disease_lookup |
A lookup table with |
grouping_columns |
Character vector of column names to use when grouping to determine the context. |
basal_diseases_to_prune |
Character vector of |
find_unaccounted_cases |
Make new records for instances when the sum of leaf diseases is less than the reported total for their basal disease. |
specials_pattern |
Optional regular expression to use to match
|
Filter out overlapping sources for the same disease/nesting_disease/basal_disease,
period_start_date, period_end_date , and iso_3166_2, with the choice to
keep either national level data (i.e. from Statistics Canada / Dominion
Bureau of Statistics / Health Canada) or provincial level data (from a
provincial ministry of Health).
normalize_duplicate_sources(data, preferred_jurisdiction = "national")normalize_duplicate_sources(data, preferred_jurisdiction = "national")
data |
A tidy data set with columns |
preferred_jurisdiction |
'national' or 'provincial', indicating which jurisdiction level will be kept if these sources overlap. |
A data set with no overlapping sources.
Set geographic order of provinces and territories and remove country-level data.
normalize_location(data)normalize_location(data)
data |
Tidy dataset with an iso_3166_2 column. |
Tidy dataset without country-level data and with provinces and territories geographically ordered.
Normalize Population
normalize_population(data, harmonized_population)normalize_population(data, harmonized_population)
data |
Tidy dataset with columns period_start_date, period_end_date iso_3166_2. |
harmonized_population |
Harmonized population data with columns date, iso_3166_2, and population (other columns will be dropped). |
Tidy dataset joined with harmonized population.
Choose a single best time_scale for each year in a dataset, grouped by
nesting disease. This best time_scale is defined as the longest
of the shortest time scales in each location and sub-disease.
normalize_time_scales( data, initial_group = c("year", "iso_3166", "iso_3166_2", "disease", "nesting_disease", "basal_disease"), final_group = c("basal_disease"), get_implied_zeros = TRUE, aggregate_if_unavailable = TRUE )normalize_time_scales( data, initial_group = c("year", "iso_3166", "iso_3166_2", "disease", "nesting_disease", "basal_disease"), final_group = c("basal_disease"), get_implied_zeros = TRUE, aggregate_if_unavailable = TRUE )
data |
A tidy data set with columns |
initial_group |
Character vector naming columns for defining the initial grouping used to compute the shortest time scales. |
final_group |
Character vector naming columns for defining the final grouping used to compute the longest of the shortest time scales. |
get_implied_zeros |
Add zeros that are implied by a '0' reported at a coarser timescale. |
aggregate_if_unavailable |
If a location is not reporting for the determined 'best timescale', but is reporting at a finer timescale, aggregate this finer timescale to the 'best timescale'. |
A data set only containing records with the optimal time scale.
Compute a vector giving the number of days in a set of periods, given equal length vectors of the start date and end date of these periods. This
num_days(start_date, end_date) num_days_util(start_date, end_date)num_days(start_date, end_date) num_days_util(start_date, end_date)
start_date |
Vector of period starting dates |
end_date |
Vector of period ending dates |
num_days_util(): Low-level interface for num_days.
Obtain period midpoints and average daily rates for count data
period_averager( data, count_col = "cases_this_period", start_col = "period_start_date", end_col = "period_end_date", norm_col = NULL, norm_const = 1e+05, keep_raw = TRUE, keep_cols = names(data) )period_averager( data, count_col = "cases_this_period", start_col = "period_start_date", end_col = "period_end_date", norm_col = NULL, norm_const = 1e+05, keep_raw = TRUE, keep_cols = names(data) )
data |
Data frame with rows at minimum containing period start and end dates and a count variable. |
count_col |
Character, name of count data column. |
start_col |
Character, name of start date column. |
end_col |
Character, name of end date column. |
norm_col |
Character, name of column giving data for normalization.
A good option is often |
norm_const |
Numeric value for multiplying the |
keep_raw |
Logical value indicating whether to force all |
keep_cols |
Character vector containing the names of columns in the
input |
Data frame containing the following fields.
Columns from the original dataset specified using keep_raw and
keep_cols.
year : Year of the period_start_date.
num_days : Length of the period in days from the beginning of the
period_start_date to the end of the period_end_date.
period_mid_time : Timestamp of the middle of the period.
period_mid_date : Date containing the period_mid_time.
daily_rate : Daily count rate, which by default is given by
daily_rate = count_col / num_days. If the name of
norm_col is specified then
daily_rate = norm_const * count_col / num_days / norm_col.
When interpreting these formulas, please keep in mind that
norm_const is a numeric constant, num_days is a derived
numeric column, and count_col and norm_col are columns
supplied within the input data object.
set.seed(666) data <- data.frame(disease = "senioritis" , period_start_date = seq(as.Date("2023-04-03"), as.Date("2023-06-05"), by = 7) , period_end_date = seq(as.Date("2023-04-09"), as.Date("2023-06-11"), by = 7) , cases_this_period = sample(0:100, 10, replace = TRUE) , location = "college" ) period_averager(data, keep_raw = TRUE, keep_cols = c("disease", "location"))set.seed(666) data <- data.frame(disease = "senioritis" , period_start_date = seq(as.Date("2023-04-03"), as.Date("2023-06-05"), by = 7) , period_end_date = seq(as.Date("2023-04-09"), as.Date("2023-06-11"), by = 7) , cases_this_period = sample(0:100, 10, replace = TRUE) , location = "college" ) period_averager(data, keep_raw = TRUE, keep_cols = c("disease", "location"))
Create function that aggregates information over time periods, normalizes a count variable, and creates new fields to summarize this information.
PeriodAggregator(rate_variable, norm_exponent = 5)PeriodAggregator(rate_variable, norm_exponent = 5)
rate_variable |
Name of variable to be used to store the normalized count variable. |
norm_exponent |
Exponent to use in normalization. The default is |
A function like period_aggregator_default that
aggregates data so that each time period is represented by exactly
one record.
Create a function that takes a data set containing at least two of
the following variables, period_start_variable, period_end_variable,
period_days_variable, and returning a data set with all of these
three variables and other variables describing the middle of the period
with either or both of period_mid_time_variable and
period_mid_date_variable. These two period middle descriptors will only
differ (by exactly 12 hours) for periods with odd numbers of days.
PeriodDescriber(mid_types = c("time", "date"))PeriodDescriber(mid_types = c("time", "date"))
mid_types |
Compute mid-times and/or mid-dates? |
A function like period_describer_default that
adds variables to describe the time period represented by each record.
Quantile transformation, adapted from https://stackoverflow.com/questions/38874741/transform-color-scale-to-probability-transformed-color-distribution-with-scale-f
quantile_trans(x)quantile_trans(x)
x |
vector to be transformed |
a scales::trans_new function
Read IIDDA Dataset into a Dataframe
read_iidda_dataset(dataset_id)read_iidda_dataset(dataset_id)
dataset_id |
ID for a dataset in the IIDDA |
Resolves any duplicate columns that results after left_join due to shared columns between data frames. Rule: Keeps old values if all newly joined values are NA. Keeps new values otherwise (even if some entries are empty)
resolve_join(df)resolve_join(df)
df |
data frame with duplicate columns ending in |
data frame with one remaining column for duplicates
Harmonizes the series variable in data so there is one data value for each time
unit in time variable (to account for different variations in disease/cause name)
SeriesHarmonizer(sum_fn = base::sum)SeriesHarmonizer(sum_fn = base::sum)
time_variable |
column name of time variable in |
series_variable |
column name of series variable in |
A function like series_harmonizer_default to
harmonize disease/cause names.
List of column names that contain numerical information that could represent a time series if processed appropriately.
std_series_variables()std_series_variables()
Vector of all possible time units, most or all are derived from lubridate functions
std_time_unitsstd_time_units
An object of class character of length 29.
List of column names that contain information locating a point or interval of time.
std_time_variables()std_time_variables()
Length of time in days representated by an object
time_extent(x, time_id)time_extent(x, time_id)
x |
an object |
time_id |
identifier for finding time axis information in the object |
Time Range Desciber
TimeRangeDescriber(cutoff = 50)TimeRangeDescriber(cutoff = 50)
Time Scale Picker
TimeScalePicker()TimeScalePicker()
A function like time_scale_picker_default that
Construct a function that takes a data frame and returns another data frame with a time variable converted so that it has the correct format, class, and/or type.
TimeVariableConverter( as_date = as.Date, as_integer = as.integer, as_numeric = as.numeric )TimeVariableConverter( as_date = as.Date, as_integer = as.integer, as_numeric = as.numeric )
as_date |
Function that takes a vector and converts it to a date vector if possible. Used only if |
as_integer |
Function that takes a vector and converts it to an integer vector if possible. Used only if |
as_numeric |
Function that takes a vector and converts it to a numeric vector if possible. Used only if Returned Function
|
Title Guesser
TitleGuesser(custom = NULL, prefer = std_title_variables())TitleGuesser(custom = NULL, prefer = std_title_variables())
custom |
Custom string for the title |
prefer |
List of variables that could contain title information in an order that will be used to find variables that will be used to guess at a title. The first variable found in the data is the one that is chosen. |
Convert a character vector (i.e. a character column) into a title for a plot.
titleize(title_info, max_items = 3L, max_chars = 15L)titleize(title_info, max_items = 3L, max_chars = 15L)
title_info |
Character vector to be summarized into a title |
max_items |
TODO |
max_chars |
TODO |
Remove leading or trailing zeros in a time series data set.
TrimSeries(zero_lead = FALSE, zero_trail = FALSE)TrimSeries(zero_lead = FALSE, zero_trail = FALSE)
zero_lead |
boolean value, if |
zero_trail |
boolean value, if |
A function like trim_series_default to remove to
remove leading and/or trailing zeroes.
Combine two time series data sets with the option to handle overlapping time periods. This is particularly useful for data sets that come from two sources (ex. LBoM and RG). Assumes both data sets have the same number of columns with the same names.
union_series(x, y, overlap = TRUE, time_variable = "period_end_date")union_series(x, y, overlap = TRUE, time_variable = "period_end_date")
x |
first data frame containing time series data |
y |
second data frame containing time series data |
overlap |
boolean to indicate if |
time_variable |
column name of time variable in |
combined x and y data frames with optional filtering for overlaps
Get unique tokens from iidda metadata
unique_entries(entries, metadata_search)unique_entries(entries, metadata_search)
entries |
List returned by |
metadata_search |
Character, field from which unique tokens are desired |
Character vector of unique tokens for a given field from all iidda datasets
Validate if variable is a date data type in the data set.
valid_time_vars(var_nm, data)valid_time_vars(var_nm, data)
var_nm |
string of variable name |
data |
data frame |
boolean of validation status
Prep data for wavelet plot. Prep steps were taken from code provided by Steven Lee (https://github.com/davidearn/StevenLee) and Kevin Zhao (https://github.com/davidearn/KevinZhao).
data |
data frame containing time series data |
trend_data |
data frame containing time series trend data |
time_variable |
column name of time variable in |
series_variable |
column name of series variable in |
trend_variable |
column name of series variable in |
series_suffix |
suffix to be appended to series data fields |
trend_suffix |
suffix to be appended to trend data fields |
wavelet_variable |
name of the field in |
output_emd_trend |
name of output field for the empirical mode decomposition applied to |
output_norm |
name of output field for the |
output_sqrt_norm |
name of output field for the square root of |
output_log_norm |
name of output field for the logarithm of ( |
output_emd_norm |
name of output field for the empirical mode decomposition applied to |
output_emd_sqrt |
name of output field for the empirical mode decomposition applied to |
output_emd_log |
name of output field for the empirical mode decomposition applied to |
output_detrend_norm |
name of output field for the computed field |
output_detrend_sqrt |
name of output field for the computed field |
output_detrend_log |
name of output field for the computed field |
data_harmonizer |
function that harmonizes time scales and series names so there is one data point per time unit |
trend_data_harmonizer |
function that harmonizes time scales and trend names so there is one data point per time unit |
data_deheaper |
function that fixes heaping errors on series data |
trend_deheaper |
function that fixes heaping errors on trend data |
joiner |
function that joins series and trend data sets |
interpolator |
function that linearly interpolates series and trend data |
normalizer |
function that computes normalized fields |
transformer |
function that computes wavelet transform |
list containing:
* transforemd_data - wavelet transformed data
* tile_data_to_plot - data set of the wavelet transformed data
prepped for plotting with ggplot2::geom_tile
* contour_data_to_plot - data set of the transformed wavelet data prepped
for plotting with ggplot2::geom_contour
Weeks covering the year end are split into two records. The first week is adjusted to end on day 365 (or 366 in leap years),
and the second week starts on the first day of the year. This was adapted from LBoM::edge_fix which keeps the same
series variable value for both of the newly created weeks. This doesn't seem to make much difference when viewing the
heatmap, however it might make sense to do something sensible like dividing the series variable value in half and allocating
each week to have half of the values.
Weeks covering the year end are split into two records. The first week is adjusted to end on day 365 (or 366 in leap years),
and the second week starts on the first day of the year. This was adapted from LBoM::edge_fix which keeps the same
series variable value for both of the newly created weeks. This doesn't seem to make much difference when viewing the
seasonal heatmap, however it might make sense to do something sensible like dividing the series variable value in half and allocating
each week to have half of the values.
year_end_fix( data, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", temp_year_variable = "yr" ) year_end_fix( data, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", temp_year_variable = "yr" )year_end_fix( data, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", temp_year_variable = "yr" ) year_end_fix( data, series_variable = "deaths", start_year_variable = "Year", end_year_variable = "End Year", start_day_variable = "Day of Year", end_day_variable = "End Day of Year", temp_year_variable = "yr" )
data |
data frame containing time series data |
series_variable |
column name of series variable in |
start_year_variable |
column name of time variable containing the year of the starting period, defaults to "Year" |
end_year_variable |
column name of time variable containing the year of the ending period, defaults to "End Year" |
start_day_variable |
column name of time variable containing the day of the starting period, defaults to "Day of Year" |
end_day_variable |
column name of time variable containing the day of the ending period, defaults to "End Day of Year" |
temp_year_variable |
temporary variable name when pivoting the data frame |
all fields in data with only records corresponding to year end weeks that have been split
all fields in data with only records corresponding to year end weeks that have been split