Title: | Wavelength |
---|---|
Description: | USAID OHA Office. Munging of mission weekly HFR data. |
Authors: | Aaron Chafetz [aut, cre], Baboyma Kagniniwa [aut] |
Maintainer: | Aaron Chafetz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.4.1 |
Built: | 2024-11-21 05:06:24 UTC |
Source: | https://github.com/USAID-OHA-SI/Wavelength |
Apply Time Stamp to file and
apply_filetimestamp(df, gdrive_rename = TRUE)
apply_filetimestamp(df, gdrive_rename = TRUE)
df |
dataframe from identify_newfiles() |
gdrive_rename |
rename on Google drive? defaule = TRUE |
Additional/optional validation against DATIM tables.
check_content(df, output_path, datim_path)
check_content(df, output_path, datim_path)
df |
HFR data framed created by |
datim_path |
path to look up files from |
df updated HFR dataframe
Other validation:
check_dates()
,
check_disaggs()
,
check_orgunituids()
,
check_output_cols()
,
is_hfrtab()
,
validate_import()
,
validate_initial()
,
validate_output()
Check whether there are any rows/records with missing dates. Provides readout on this as well as on whether the submission covers multiple periods and the dates covered in the file.
check_dates(df)
check_dates(df)
df |
HFR data framed created by |
Other validation:
check_content()
,
check_disaggs()
,
check_orgunituids()
,
check_output_cols()
,
is_hfrtab()
,
validate_import()
,
validate_initial()
,
validate_output()
Check whether there are any rows/records with missing disaggs and provides readout
check_disaggs(df)
check_disaggs(df)
df |
HFR data framed created by |
Other validation:
check_content()
,
check_dates()
,
check_orgunituids()
,
check_output_cols()
,
is_hfrtab()
,
validate_import()
,
validate_initial()
,
validate_output()
Check OUs listed in operatingunit
check_distinct_ous(df)
check_distinct_ous(df)
df |
df created during |
Other internal:
check_template_cols()
,
count_missing()
,
flag_extra()
,
flag_missing()
,
hfr_export_mech()
Check whether there are any rows/records with missing indicators and provides readout
check_inds(df)
check_inds(df)
df |
HFR data framed created by |
Check whether there are any rows/records with missing mechanisms and provides readout of the mechanisms included
check_mechs(df)
check_mechs(df)
df |
HFR data framed created by |
Check whether there are any rows/records with missing orgunits and provides readout
check_orgunituids(df)
check_orgunituids(df)
df |
HFR data framed created by |
Other validation:
check_content()
,
check_dates()
,
check_disaggs()
,
check_output_cols()
,
is_hfrtab()
,
validate_import()
,
validate_initial()
,
validate_output()
Ensure all expected columns exist before exporting
check_output_cols(df)
check_output_cols(df)
df |
HFR data framed created by |
Other validation:
check_content()
,
check_dates()
,
check_disaggs()
,
check_orgunituids()
,
is_hfrtab()
,
validate_import()
,
validate_initial()
,
validate_output()
Checks template's columns
check_template_cols(df)
check_template_cols(df)
df |
df created during |
Other internal:
check_distinct_ous()
,
count_missing()
,
flag_extra()
,
flag_missing()
,
hfr_export_mech()
Confirm validations of processed files
confirm_validations(hfr_data, hfr_errors, dir_files)
confirm_validations(hfr_data, hfr_errors, dir_files)
hfr_data |
content of processed files |
hfr_errors |
errors detected from files content |
dir_files |
location of processed files |
## Not run: confirm_validations(hfr_data, hfr_errors, dir_files) ## End(Not run)
## Not run: confirm_validations(hfr_data, hfr_errors, dir_files) ## End(Not run)
Counts the number of rows where there are missing records
count_missing(df, var)
count_missing(df, var)
df |
data frame |
var |
variable to count missing values |
Other internal:
check_distinct_ous()
,
check_template_cols()
,
flag_extra()
,
flag_missing()
,
hfr_export_mech()
Current Fiscal Year
curr_fy
curr_fy
An object of class numeric
of length 1.
Current Fiscal Year
This code should be run on HFR submission prior to loading into DDC/s3. Currently, DDC cannot handle two issues - (1) tabs with only one row of data and (2) tabs not ordered from least to greatest This code resolves the first by creating a second row of data with the first value and the second by reordering tabs using openxlsx.
ddcpv_check(filepath)
ddcpv_check(filepath)
filepath |
path to HFR submission |
print out of checks and
## Not run: files <- list.files("ou_submissions/", "xlsx", full.names = TRUE) ddcpv_check(files[1]) purrr::walk(files, ddcpv_check) ## End(Not run)
## Not run: files <- list.files("ou_submissions/", "xlsx", full.names = TRUE) ddcpv_check(files[1]) purrr::walk(files, ddcpv_check) ## End(Not run)
Download New Submission to upload
download_new(df)
download_new(df)
df |
dataframe from apply_filetimestamp |
Extract OU ISO3 code
extract_iso3code(pfile)
extract_iso3code(pfile)
pfile |
processed file |
ISO3 3 character iso code
## Not run: extract_iso3code('HFR_2020.99_XAR_100000_processed_20200528.csv') ## End(Not run)
## Not run: extract_iso3code('HFR_2020.99_XAR_100000_processed_20200528.csv') ## End(Not run)
Extract mechanism code
extract_mechcode(pfile)
extract_mechcode(pfile)
pfile |
processed file |
mech_code mechanism code
## Not run: extract_mechcode('HFR_2020.99_XAR_100000_processed_20200528.csv') ## End(Not run)
## Not run: extract_mechcode('HFR_2020.99_XAR_100000_processed_20200528.csv') ## End(Not run)
Flag Extra Variables
flag_extra(required, submitted)
flag_extra(required, submitted)
required |
list of required vars |
submitted |
list of vars pulled from submission |
Other internal:
check_distinct_ous()
,
check_template_cols()
,
count_missing()
,
flag_missing()
,
hfr_export_mech()
Flag Missing Variables
flag_missing(required, submitted)
flag_missing(required, submitted)
required |
list of required vars |
submitted |
list of vars pulled from submission |
Other internal:
check_distinct_ous()
,
check_template_cols()
,
count_missing()
,
flag_extra()
,
hfr_export_mech()
Generate a API URL
gen_url( ou_uid, org_lvl, org_type = "facility", value_type = "results", is_hts = FALSE, fy_pd = NULL, baseurl = "https://final.datim.org/" )
gen_url( ou_uid, org_lvl, org_type = "facility", value_type = "results", is_hts = FALSE, fy_pd = NULL, baseurl = "https://final.datim.org/" )
ou_uid |
UID for the country, recommend using |
org_lvl |
org hierarchy level, eg facility is level 7 in country X, recommend using |
org_type |
organization type, either facility (default) or community |
value_type |
results (default) or targets |
is_hts |
is the API for HTS indicators (HTS_TST or HTS_TST_POS), default = FALSE |
fy_pd |
fiscal year(s) to cover, default will be current FY if not provided |
baseurl |
API base url, default = https://final.datim.org/ |
## Not run: #get OU UID ouuid <- identify_ouuids() %>% dplyr::filter(ou == "Ghana") #get facility level faclvl <- identify_levels("Ghana", "facility", username = myuser, password = mypwd()) #gen url myurl <- gen_url(ouuid, faclvl, org_type = facility) ## End(Not run)
## Not run: #get OU UID ouuid <- identify_ouuids() %>% dplyr::filter(ou == "Ghana") #get facility level faclvl <- identify_levels("Ghana", "facility", username = myuser, password = mypwd()) #gen url myurl <- gen_url(ouuid, faclvl, org_type = facility) ## End(Not run)
DATIM API Call for Targets
get_datim_data(url, username, password)
get_datim_data(url, username, password)
url |
supply url forAPI call, recommend using |
username |
DATIM username |
password |
DATIM password, recommend using |
## Not run: myurl <- paste0(baseurl, "api/29/analytics.json? dimension=LxhLO68FcXm:udCop657yzi& dimension=ou:LEVEL-4;HfVjCurKxh2& filter=pe:2018Oct& displayProperty=SHORTNAME&outputIdScheme=CODE") myuser <- "UserX" df_datim <- get_datim_data(myurl, myuser, mypwd(myuser)) ## End(Not run)
## Not run: myurl <- paste0(baseurl, "api/29/analytics.json? dimension=LxhLO68FcXm:udCop657yzi& dimension=ou:LEVEL-4;HfVjCurKxh2& filter=pe:2018Oct& displayProperty=SHORTNAME&outputIdScheme=CODE") myuser <- "UserX" df_datim <- get_datim_data(myurl, myuser, mypwd(myuser)) ## End(Not run)
DATIM API Call for Targets
get_datim_targets(url, username, password)
get_datim_targets(url, username, password)
url |
supply url forAPI call, recommend using |
username |
DATIM username |
password |
DATIM password, recommend using |
## Not run: myurl <- paste0(baseurl, "api/29/analytics.json? dimension=LxhLO68FcXm:udCop657yzi& dimension=ou:LEVEL-4;HfVjCurKxh2& filter=pe:2018Oct& displayProperty=SHORTNAME&outputIdScheme=CODE") myuser <- "UserX" df_targets <- get_datim_targets(myurl, myuser, mypwd(myuser)) ## End(Not run)
## Not run: myurl <- paste0(baseurl, "api/29/analytics.json? dimension=LxhLO68FcXm:udCop657yzi& dimension=ou:LEVEL-4;HfVjCurKxh2& filter=pe:2018Oct& displayProperty=SHORTNAME&outputIdScheme=CODE") myuser <- "UserX" df_targets <- get_datim_targets(myurl, myuser, mypwd(myuser)) ## End(Not run)
Get operating unit name
get_operatingunit(iso_code, orglevels)
get_operatingunit(iso_code, orglevels)
iso_code |
iso3 code |
orglevels |
df org levels |
operating unit
## Not run: get_operatingunit(org_levels, 'XWA') ## End(Not run)
## Not run: get_operatingunit(org_levels, 'XWA') ## End(Not run)
Get operating unit name
guess_operatingunit(pfile, levels, ims)
guess_operatingunit(pfile, levels, ims)
pfile |
processed file |
levels |
org levels |
ims |
mechanisms df |
operating unit name
## Not run: get_mech_ou(ims, 'HFR_2020.99_XAR_100000_processed_20200528.csv') ## End(Not run)
## Not run: get_mech_ou(ims, 'HFR_2020.99_XAR_100000_processed_20200528.csv') ## End(Not run)
Aggregate data frame to combine rows where needed (minimize row count). Multiple lines may be entered for the same unique reporting combination/id
hfr_aggr(df)
hfr_aggr(df)
df |
HFR data frame imported via |
#' Append HFR and DATIM Data
hfr_append_sources( folderpath_hfr, folderpath_datim, start_date, weeks = 4, max_date = TRUE, folderpath_output )
hfr_append_sources( folderpath_hfr, folderpath_datim, start_date, weeks = 4, max_date = TRUE, folderpath_output )
folderpath_hfr |
folder path to HFR processed data |
folderpath_datim |
folder path to DATIM extracts, see |
start_date |
start date of HFR period, YYYY-MM-DD format |
weeks |
number of weeks to create, default = 4 |
max_date |
cut off data at max date? default = NULL |
folderpath_output |
folder path for saving the output |
Creates new columns for specifying the HFR reporting period - fy (20XX) and hfr_pd (fiscal month), 1-12
hfr_assign_pds(df)
hfr_assign_pds(df)
df |
HFR data frame with date |
## Not run: df <- hfr_assign_pds(df) ## End(Not run)
## Not run: df <- hfr_assign_pds(df) ## End(Not run)
Function to export different data frames with standardized naming formats
hfr_export( df, folderpath_output = NULL, type = "processed", by_mech = FALSE, quarters_complete = NULL )
hfr_export( df, folderpath_output = NULL, type = "processed", by_mech = FALSE, quarters_complete = NULL )
df |
structured High Frequency Data Frame |
folderpath_output |
provide the full path to the folder for saving |
type |
type of data being saved, default = processed |
by_mech |
export by mechanism, default = FALSE |
quarters_complete |
FOR DATIM ONLY: # of quarters completed through FY to determine weeks left in year |
## Not run: #write output hfr_export(df_tza, "~/WeeklyData") ## End(Not run)
## Not run: #write output hfr_export(df_tza, "~/WeeklyData") ## End(Not run)
Export csv files by mechanism
hfr_export_mech(df, mech, type, folderpath_output, quarters_complete)
hfr_export_mech(df, mech, type, folderpath_output, quarters_complete)
df |
tructured High Frequency Data Frame |
mech |
mech_code |
type |
type type of data being saved, default = processed |
folderpath_output |
provide the full path to the folder for saving |
quarters_complete |
FOR DATIM ONLY: # of quarters completed through FY to determine weeks left in year |
Other internal:
check_distinct_ous()
,
check_template_cols()
,
count_missing()
,
flag_extra()
,
flag_missing()
Useful for pulling information about the template, whether It be the Operating Unit (OU), Period, template version, or type, eg wide or long.
hfr_extract_meta(filepath, meta_type = "type")
hfr_extract_meta(filepath, meta_type = "type")
filepath |
filepath to sumbitted template |
meta_type |
type of meta data requesting: ou, period, version, type (default) |
## Not run: #identify whether template is long or wide filepath <- "~/WeeklyData/Raw/KEN_Weekly.xlsx" hfr_extract_meta(filepath, meta_type = "type") #identify period hfr_extract_meta(filepath, meta_type = "period") #identify OU hfr_extract_meta(filepath, meta_type = "ou") ## End(Not run)
## Not run: #identify whether template is long or wide filepath <- "~/WeeklyData/Raw/KEN_Weekly.xlsx" hfr_extract_meta(filepath, meta_type = "type") #identify period hfr_extract_meta(filepath, meta_type = "period") #identify OU hfr_extract_meta(filepath, meta_type = "ou") ## End(Not run)
Limits data frame to only the current reporting period when the fiscal year and period are provided.
hfr_filter_pd(df, hfr_pd_sel = NULL, hfr_fy_sel = NULL)
hfr_filter_pd(df, hfr_pd_sel = NULL, hfr_fy_sel = NULL)
df |
HFR data frame imported via |
hfr_pd_sel |
HFR reporting period, 1-13, no filter when NULL, default = NULL |
hfr_fy_sel |
fiscal year, default = NULL |
The submission templates do not have date validations stored in the Excel files, so there can be a number of date types submitted that we attempt to account for outside of the normal, ISO format of YYYY-MM-DD. This function handles each identfied date type as a separate dataframe and then binds them back together. Date formats include - Excel, ISO, character dates. If the round_hfrdate param is TRUE, the function rounds to the start of the week or month and aggregates any mid-week/month submission.
hfr_fix_date(df, round_hfrdate = FALSE)
hfr_fix_date(df, round_hfrdate = FALSE)
df |
HFR data frame imported via |
round_hfrdate |
rounds date to the nearest HFRweek start (for non-compliance), default = FALSE |
A number of issues crop up frequently enough around indicators, disaggregates, values, etc, that worth coding a resolution into a singular place. This function aims to be a place to rectify any non-standard issue that arises more than a couple of times.
hfr_fix_noncompliance(df)
hfr_fix_noncompliance(df)
df |
HFR data frame imported via |
Generate Gap Targets
hfr_gap_target(df, quarters_complete)
hfr_gap_target(df, quarters_complete)
df |
data frame created by |
quarters_complete |
MER quarters with data available |
## Not run: myuser <- "UserX" mech_x_targets <- extract_datim(00001, myuser, mypwd(myuser)) mech_x_targets <- hfr_gap_target(mech_x_targets, 2) ## End(Not run)
## Not run: myuser <- "UserX" mech_x_targets <- extract_datim(00001, myuser, mypwd(myuser)) mech_x_targets <- hfr_gap_target(mech_x_targets, 2) ## End(Not run)
Tidy Wide or Wide-LIMITED submissions by pivoting them long and separating column name into relevant parts.
hfr_gather(df)
hfr_gather(df)
df |
HFR data frame imported via |
Changes frequency from week to month agg and rounds up to month
hfr_group_wkly(df)
hfr_group_wkly(df)
df |
HFR data frame imported via |
DDC Status Report Google Sheet ID
hfr_gs_statrep
hfr_gs_statrep
An object of class character
of length 1.
google sheet id
Flags the frequency of reporting, adding this to the data frame for tracking and for use if rounding and aggregating.
hfr_identify_freq(df)
hfr_identify_freq(df)
df |
HFR data frame from |
Create a data frame of HFR weeks and periods
hfr_identify_pds(fy = NULL, week = FALSE)
hfr_identify_pds(fy = NULL, week = FALSE)
fy |
fiscal year |
week |
are periods weekly? default = FALSE |
Read in a standard HFR submission, binding all tabs into one. Note that all tabs must be in the same template format - all long, wide, or wide-limited.
hfr_import(filepath)
hfr_import(filepath)
filepath |
filepath to submitted template |
## Not run: #identify whether template is long or wide filepath <- "~/HFR/HFR_FY23_Oct_Moldova_20221015.xlsx" df_hfr <- hfr_import(filepath) ## End(Not run)
## Not run: #identify whether template is long or wide filepath <- "~/HFR/HFR_FY23_Oct_Moldova_20221015.xlsx" df_hfr <- hfr_import(filepath) ## End(Not run)
Clean up/standardize string text for indicators and disaggregates
hfr_munge_string(df)
hfr_munge_string(df)
df |
HFR data frame imported via |
A look up for an partial orgunit name against the DATIM list of orgunits, when trying to find the correct or missing orgunituid
hfr_orgunit_search(df, orgunit_name, ou = NULL)
hfr_orgunit_search(df, orgunit_name, ou = NULL)
df |
org hierarchy, created in pull_hierarchy() |
orgunit_name |
full or partial orgunit name for matching |
ou |
operating unit; if added searches only that OU default = NULL |
## Not run: load_secrets("datim") org <- pull_hierarchy(datim_user(), datim_pwd()) # orgunit - "Kewot" hfr_orgunit_search(org, "Kew", "Ethiopia") ## End(Not run)
## Not run: load_secrets("datim") org <- pull_hierarchy(datim_user(), datim_pwd()) # orgunit - "Kewot" hfr_orgunit_search(org, "Kew", "Ethiopia") ## End(Not run)
Run validation check on country HFR submission while it processes and tidies the submission for inclusion into the database.
hfr_process_template( filepath, round_hfrdate = FALSE, hfr_pd_sel = NULL, folderpath_output = NULL, datim_path = NULL )
hfr_process_template( filepath, round_hfrdate = FALSE, hfr_pd_sel = NULL, folderpath_output = NULL, datim_path = NULL )
filepath |
filepath to sumbited template |
round_hfrdate |
rounds date to the nearest HFRweek start (for non-compliance), default = FALSE |
hfr_pd_sel |
filter for HFR reporting period, 1-13, no filter when NULL, default = NULL |
folderpath_output |
if a csv output is desired, provide the full path to the folder |
datim_path |
path to DATIM lookup files for full validation |
## Not run: #file path for the path <- "~/WeeklyData/Saturn" output_folder <- "~/WeeklyData/Output" #process Excel file for Saturn hfr_process_template(path, output_folder) #process Excel file for Saturn with full validation datim_folder <- "~/Datim" hfr_process_template(path, output_folder, datim_path = datim_folder) ## End(Not run)
## Not run: #file path for the path <- "~/WeeklyData/Saturn" output_folder <- "~/WeeklyData/Output" #process Excel file for Saturn hfr_process_template(path, output_folder) #process Excel file for Saturn with full validation datim_folder <- "~/Datim" hfr_process_template(path, output_folder, datim_path = datim_folder) ## End(Not run)
Read in HFR output file
hfr_read(filepath)
hfr_read(filepath)
filepath |
filepath of an HFR output file |
## Not run: path <- "~/data/HFR_2020.01_Global_output_20191204.1705.csv" df <- hfr_read(path) ## End(Not run)
## Not run: path <- "~/data/HFR_2020.01_Global_output_20191204.1705.csv" df <- hfr_read(path) ## End(Not run)
Batch read hfr files
hfr_read_all(pfolder, pattern, source = FALSE)
hfr_read_all(pfolder, pattern, source = FALSE)
pfolder |
processed file folder |
pattern |
filename pattern |
source |
append source filename? |
df
## Not run: hfr_read_all('./Data/2020.05', pattern='HFR_FY2020.05') ## End(Not run)
## Not run: hfr_read_all('./Data/2020.05', pattern='HFR_FY2020.05') ## End(Not run)
Without access to change data directly in the database, the only means of changing data is to resubmit a zeroed out version of the original submission. This function replaces any cell with entered data with zero so that it can be resubmitted and processed by Trifacta. The original, correct version should be re-processed after the zeroed dataset has been processes
hfr_rectify_date(subm_file, subm_tab, folderpath_templates = "templates/")
hfr_rectify_date(subm_file, subm_tab, folderpath_templates = "templates/")
subm_file |
submission file with incorrect dates |
subm_tab |
tab with incorrect dates |
folderpath_templates |
folder path where current HFR templates are located |
exports two files - one with zeroed out data for wrong date and one with corrected date
## Not run: #store file paths for looping over to read in df_files_tabs <- list.files(folderpath, full.names = TRUE) %>% purrr::map_dfr(~ tibble::tibble(file = .x, tabs = readxl::excel_sheets(.x))) %>% dplyr::filter(stringr::str_detect(tabs, "HFR")) #fix date and create zeroed version for wrong date (to clean from DB) pwalk(df_files_tabs, ~hfr_rectify_date(..1, ..2)) ## End(Not run)
## Not run: #store file paths for looping over to read in df_files_tabs <- list.files(folderpath, full.names = TRUE) %>% purrr::map_dfr(~ tibble::tibble(file = .x, tabs = readxl::excel_sheets(.x))) %>% dplyr::filter(stringr::str_detect(tabs, "HFR")) #fix date and create zeroed version for wrong date (to clean from DB) pwalk(df_files_tabs, ~hfr_rectify_date(..1, ..2)) ## End(Not run)
Limits the datafram to only have the exact columsn found in the template, removing any extras
hfr_restrict_cols(df)
hfr_restrict_cols(df)
df |
HFR data frame imported via |
Round values to nearest HFR date
hfr_round_date(df)
hfr_round_date(df)
df |
df HFR data frame imported via |
Clean up DATIM Hierarchy Path
hierarchy_clean(df)
hierarchy_clean(df)
df |
data frame created by |
Pull Hierarchy Data from DATIM
hierarchy_extract( ou_uid, username, password, baseurl = "https://final.datim.org/" )
hierarchy_extract( ou_uid, username, password, baseurl = "https://final.datim.org/" )
ou_uid |
UID for the country, recommend using |
username |
DATIM username |
password |
DATIM password, recommend using |
baseurl |
API base url, default = https://final.datim.org/ |
## Not run: #get OU UID ouuid <- identify_ouuids() %>% dplyr::filter(ou == "Kenya") #pull hierarchy (paths are all UIDs) df <- hierarchy_extract(ouuid, username = myuser, password = mypwd(myuser)) ## End(Not run)
## Not run: #get OU UID ouuid <- identify_ouuids() %>% dplyr::filter(ou == "Kenya") #pull hierarchy (paths are all UIDs) df <- hierarchy_extract(ouuid, username = myuser, password = mypwd(myuser)) ## End(Not run)
Extract country name from OU or country name
hierarchy_identify_ctry(df)
hierarchy_identify_ctry(df)
df |
data frame created by |
Rename Hierarchy from Levels to OU/SNU1/PSNU/Facility
hierarchy_rename( df, country, username, password, baseurl = "https://final.datim.org/" )
hierarchy_rename( df, country, username, password, baseurl = "https://final.datim.org/" )
df |
data frame created by |
country |
county name, eg "Malawi" or "Nepal" |
username |
DATIM username |
password |
DATIM password, recommend using |
baseurl |
API base url, default = https://final.datim.org/ |
Identify Facility/Community levels in org hierarchy
identify_levels( ou = NULL, username, password, baseurl = "https://final.datim.org/" )
identify_levels( ou = NULL, username, password, baseurl = "https://final.datim.org/" )
ou |
operating unit name |
username |
DATIM username |
password |
DATIM password, recommend using |
baseurl |
base API url, default = https://final.datim.org/ |
## Not run: #table for all OUs myuser <- "UserX" identify_levels(username = myuser, password = mypwd()) #table for just Kenya identify_levels("Kenya", username = myuser, password = mypwd()) ## End(Not run)
## Not run: #table for all OUs myuser <- "UserX" identify_levels(username = myuser, password = mypwd()) #table for just Kenya identify_levels("Kenya", username = myuser, password = mypwd()) ## End(Not run)
Identify New Submissions on Google Drive
identify_newfiles(print_files = TRUE, id_modified = TRUE)
identify_newfiles(print_files = TRUE, id_modified = TRUE)
print_files |
print out list of new files |
id_modified |
removed Modified from filename |
new files, along with submission (df_submissions) and s3 files from the archive folder (df_archive)
Pull OU UIDS
identify_ouuids(username, password, baseurl = "https://final.datim.org/")
identify_ouuids(username, password, baseurl = "https://final.datim.org/")
username |
DATIM Username |
password |
DATIM password, recommend using |
baseurl |
base url for the API, default = https://final.datim.org/ |
## Not run: ous <- identify_ouuids("userx", mypwd("userx")) ## End(Not run)
## Not run: ous <- identify_ouuids("userx", mypwd("userx")) ## End(Not run)
Validate reporting dates
is_date_valid(.data, df_dates)
is_date_valid(.data, df_dates)
.data |
df |
df_dates |
df_dates |
boolean
## Not run: data %>% is_date_valid(df_dates) ## End(Not run)
## Not run: data %>% is_date_valid(df_dates) ## End(Not run)
Determine if there are tabs to import
is_hfrtab(filepath)
is_hfrtab(filepath)
filepath |
filepath to submitted template |
Other validation:
check_content()
,
check_dates()
,
check_disaggs()
,
check_orgunituids()
,
check_output_cols()
,
validate_import()
,
validate_initial()
,
validate_output()
Validate mechanism code
is_mech_valid(.data, df_mechs)
is_mech_valid(.data, df_mechs)
.data |
df |
df_mechs |
mechs |
boolean
## Not run: data %>% is_mech_valid(df_dates) ## End(Not run)
## Not run: data %>% is_mech_valid(df_dates) ## End(Not run)
Validate mechanism code
is_mech4ou(.data, df_mechs)
is_mech4ou(.data, df_mechs)
.data |
df |
df_mechs |
mechs |
boolean
## Not run: data %>% is_mech4ou_valid(df_mechs) ## End(Not run)
## Not run: data %>% is_mech4ou_valid(df_mechs) ## End(Not run)
Determine whether meta tab exists
is_metatab(filepath)
is_metatab(filepath)
filepath |
filepath to sumbitted template |
Other utility:
package_check()
,
var_exists()
Check if Operating Unit name is valid
is_orgunituid_valid(.data, df_orgs)
is_orgunituid_valid(.data, df_orgs)
df_orgs |
orgs |
data |
df |
boolean
## Not run: data %>% is_orgunituid_valid(df_orgs) ## End(Not run)
## Not run: data %>% is_orgunituid_valid(df_orgs) ## End(Not run)
Check if orgunituid exist in operating unit
is_orgunituid4ou(.data, df_orgs)
is_orgunituid4ou(.data, df_orgs)
df_orgs |
org_hierarchy df |
data |
df |
data with new column: T/F
## Not run: data %>% orgunituid4ou(df_orgs) ## End(Not run)
## Not run: data %>% orgunituid4ou(df_orgs) ## End(Not run)
Check if OperatingUnit is valid
is_ou_valid(.data, df_orgs)
is_ou_valid(.data, df_orgs)
df_orgs |
df_orgs |
data |
df |
boolean
## Not run: data %>% is_ou_valid(df_orgs) ## End(Not run)
## Not run: data %>% is_ou_valid(df_orgs) ## End(Not run)
Table that cross references the ISO Country codes to the PEPFAR Operating Units and Countries. This table is used during the export process for file naming. ISO codes were originall pulled from DATIM.
data(iso_map)
data(iso_map)
A data frame with each PEPFAR OU and associated ISO code.
PEPFAR Operating Unit or country
3 letter ISO code for the Operatingunit or country
TRUE if operatingunit is a country under a regional program
"https://final.datim.org/api/dataStore/dataSetAssignments/orgUnitLevels"
Load DATIM Look up tables
load_lookups(datim_path = "./Data/", local = TRUE, user = NULL)
load_lookups(datim_path = "./Data/", local = TRUE, user = NULL)
datim_path |
HFR Data folder ID |
local |
Read data from local directory? |
user |
User email address |
## Not run: #load look up data load_lookups(datim_path = "./Data/datim") load_lookups(datim_path = "999#1aw####") ## End(Not run)
## Not run: #load look up data load_lookups(datim_path = "./Data/datim") load_lookups(datim_path = "999#1aw####") ## End(Not run)
Identify S3 file names matching Ignored files
match_ignoredfiles(file_name, df_ref, ref_name = "name")
match_ignoredfiles(file_name, df_ref, ref_name = "name")
file_name |
Filename to be used as look up |
df_ref |
Reference data frame, Recommend using files present in S3 but not in gdrive, Eg: |
ref_name |
Column name to be used for lookup |
## Not run: library(tidyverse) library(Wavelength) library(googlesheets4) ss_sbm <- as_sheets_id("<xyz>") df_ignore_files <- read_sheet(ss = ss_sbm, "ignore_files") df_new <- identify_newfiles() # this will also generate df_submissions & df_archive # Ghost files df_ghosts <- df_submissions %>% full_join(df_archive, by = c("name" = "sys_data_object")) %>% filter(is.na(exists_gdrive)) # Match ignored files df_ignore_files <- df_ignore_files %>% mutate(names_s3 = match_ignoredfiles(name_googledrive, df_ghosts, "name") # Update ignore files table range_write(ss = sbm_form, data = df_ignore_files, sheet = "ignore_files", col_names = TRUE) ## End(Not run)
## Not run: library(tidyverse) library(Wavelength) library(googlesheets4) ss_sbm <- as_sheets_id("<xyz>") df_ignore_files <- read_sheet(ss = ss_sbm, "ignore_files") df_new <- identify_newfiles() # this will also generate df_submissions & df_archive # Ghost files df_ghosts <- df_submissions %>% full_join(df_archive, by = c("name" = "sys_data_object")) %>% filter(is.na(exists_gdrive)) # Match ignored files df_ignore_files <- df_ignore_files %>% mutate(names_s3 = match_ignoredfiles(name_googledrive, df_ghosts, "name") # Update ignore files table range_write(ss = sbm_form, data = df_ignore_files, sheet = "ignore_files", col_names = TRUE) ## End(Not run)
Table of OPM Holidays for 2021-26 for scheduling submission dates.
data(opm_holiday)
data(opm_holiday)
A data frame with each Federaly observed holiday from 2021-2026.
date holiday is observered
holiday
"https://www.opm.gov/policy-data-oversight/pay-leave/federal-holidays"
Check if package exists
package_check(pkg)
package_check(pkg)
pkg |
package name |
warning message if package is not installed
Other utility:
is_metatab()
,
var_exists()
Paint console text in blue
paint_blue(txt)
paint_blue(txt)
txt |
text to be printed |
Other text_color:
paint_green()
,
paint_red()
,
paint_yellow()
Paint console text in green
paint_green(txt)
paint_green(txt)
txt |
text to be printed |
Other text_color:
paint_blue()
,
paint_red()
,
paint_yellow()
Paint console text in red
paint_red(txt)
paint_red(txt)
txt |
text to be printed |
Other text_color:
paint_blue()
,
paint_green()
,
paint_yellow()
Paint console text in yellow
paint_yellow(txt)
paint_yellow(txt)
txt |
text to be printed |
Other text_color:
paint_blue()
,
paint_green()
,
paint_red()
Parse out submitted file components
parse_submission(pfile)
parse_submission(pfile)
sfile |
processed file |
component As vector c("hfr_pd", "iso3", "mech_code", "pdate")
## Not run: parse_submission("HFR_2020.99_XWH_100000_processed_20200101.csv") ## End(Not run)
## Not run: parse_submission("HFR_2020.99_XWH_100000_processed_20200101.csv") ## End(Not run)
Compile PEPFAR Hierarchy
pull_hierarchy( ou_uid, username, password, baseurl = "https://final.datim.org/", folderpath_output = NULL )
pull_hierarchy( ou_uid, username, password, baseurl = "https://final.datim.org/", folderpath_output = NULL )
ou_uid |
UID for the country, recommend using |
username |
DATIM username |
password |
DATIM password, recommend using |
baseurl |
API base url, default = https://final.datim.org/ |
folderpath_output |
provide the full path to the folder for saving |
## Not run: #get OU UID ouuid <- identify_ouuids() %>% dplyr::filter(ou == "Kenya") #pull hierarchy (paths are all UIDs) df <- pull_hierarchy(ouuid, username = myuser, password = mypwd(myuser)) ## End(Not run)
## Not run: #get OU UID ouuid <- identify_ouuids() %>% dplyr::filter(ou == "Kenya") #pull hierarchy (paths are all UIDs) df <- pull_hierarchy(ouuid, username = myuser, password = mypwd(myuser)) ## End(Not run)
Pull Partner/Mechanism Info from DATIM
pull_mech(usaid_only = TRUE, ou_sel = NULL, folderpath_output = NULL)
pull_mech(usaid_only = TRUE, ou_sel = NULL, folderpath_output = NULL)
usaid_only |
specify if only USAID mechansism should be returned, default = TRUE |
ou_sel |
option to specify an operating unit, default = NULL |
folderpath_output |
provide the full path to the folder for saving |
## Not run: #pull mechanism/partner information df <- pull_mech() ## End(Not run)
## Not run: #pull mechanism/partner information df <- pull_mech() ## End(Not run)
Extract DATIM Results and Targets (DATIM API Call)
pull_mer( ou_name = NULL, username, password, baseurl = "https://final.datim.org/", fy_pd = NULL, quarters_complete = NULL, folderpath_output = NULL )
pull_mer( ou_name = NULL, username, password, baseurl = "https://final.datim.org/", fy_pd = NULL, quarters_complete = NULL, folderpath_output = NULL )
ou_name |
Operating Unit name, if mechanism is not specified |
username |
DATIM username |
password |
DATIM password, recommend using |
baseurl |
API base url, default = https://final.datim.org/ |
fy_pd |
fiscal year(s) to cover, default will be current FY if not provided |
quarters_complete |
no. of quarters completed through FY to determine weeks left in year |
folderpath_output |
folder path to store DATIM output, default = NULL |
## Not run: #ou mer data myuser <- "UserX" mech_x_dta <- pull_mer(ou_name = "Namibia", username = myuser, password = mypwd(myuser)) ## End(Not run)
## Not run: #ou mer data myuser <- "UserX" mech_x_dta <- pull_mer(ou_name = "Namibia", username = myuser, password = mypwd(myuser)) ## End(Not run)
Report files validation
report_submissions_errors(df_files, mechanisms, export = FALSE)
report_submissions_errors(df_files, mechanisms, export = FALSE)
df_files |
df of filename validation |
mechanisms |
mechs |
void
## Not run: report_file_errors(files, ims, processed) ## End(Not run)
## Not run: report_file_errors(files, ims, processed) ## End(Not run)
Revert validated file to processed files
revert_validations(dir_files)
revert_validations(dir_files)
dir_files |
location of processed files |
## Not run: revert_validations(dir_files) ## End(Not run)
## Not run: revert_validations(dir_files) ## End(Not run)
Download from S3 and push to Google Drive
stash_outgoing( prefix = c("HFR_Tableau", "HFR_Submission", "Mechanism", "Detailed"), outputfolder, gdrive = FALSE )
stash_outgoing( prefix = c("HFR_Tableau", "HFR_Submission", "Mechanism", "Detailed"), outputfolder, gdrive = FALSE )
prefix |
file prefix - "HFR_Tableau","HFR_Submission", "Mechanism", "Detailed" |
outputfolder |
folder path to store file |
gdrive |
whether to upload to Google Drive, default = FALSE |
## Not run: stash_outgoing("HFR_Tableau", "out/joint") ## End(Not run)
## Not run: stash_outgoing("HFR_Tableau", "out/joint") ## End(Not run)
List of column headers for the HFR Long Template
data(template_cols_long)
data(template_cols_long)
A list of all headers
column names
List of column headers for the HFR Indicator Meta Data
data(template_cols_meta)
data(template_cols_meta)
A list of all headers
column names
List of column headers for the HFR Wide Template
data(template_cols_wide)
data(template_cols_wide)
A list of all headers
column names
List of column headers for the HFR Wide Template
data(template_cols_wide_lim)
data(template_cols_wide_lim)
A list of all headers
column names
Import and munge submitted site list
tidy_sitelist(filepath, folderpath_output = NULL)
tidy_sitelist(filepath, folderpath_output = NULL)
filepath |
path to sitelist file |
folderpath_output |
if output is desired, full folder path |
## Not run: path <- "~/Data/HFR_FY21_SiteValidation_Kenya.xlsx" df_sites <- tidy_sitelist(path) ## End(Not run)
## Not run: path <- "~/Data/HFR_FY21_SiteValidation_Kenya.xlsx" df_sites <- tidy_sitelist(path) ## End(Not run)
Update USAID Mechanism meta table
update_meta_mechs(savefolder = "out/DATIM", upload = FALSE)
update_meta_mechs(savefolder = "out/DATIM", upload = FALSE)
savefolder |
folderpath to save, default = "out/DATIM" |
upload |
should the new table be pushed to Google Drive and s3? default = FALSE |
Update MER meta table
update_meta_mer(fy_pd = NULL, savefolder = "out/DATIM", upload = FALSE)
update_meta_mer(fy_pd = NULL, savefolder = "out/DATIM", upload = FALSE)
fy_pd |
fiscal year(s) to cover, default will be current FY if not provided |
savefolder |
folderpath to save, default = "out/DATIM" |
upload |
should the new table be pushed to s3? default = FALSE |
Update Organization Hierarchy meta table
update_meta_orgs(savefolder = "out/DATIM", upload = FALSE)
update_meta_orgs(savefolder = "out/DATIM", upload = FALSE)
savefolder |
folderpath to save, default = "out/DATIM" |
upload |
should the new table be pushed to Google Drive and s3? default = FALSE |
Update MER targets from MSD (prior to site results being available)
update_meta_targets(fy, savefolder = "out/DATIM", upload = FALSE)
update_meta_targets(fy, savefolder = "out/DATIM", upload = FALSE)
fy |
fiscal year |
savefolder |
folderpath to save, default = "out/DATIM" |
upload |
should the new table be pushed to s3? default = FALSE |
Update invalid operating units
update_operatingunits(hfr_data, levels, orgs, ims = NULL)
update_operatingunits(hfr_data, levels, orgs, ims = NULL)
hfr_data |
processed hfr data |
levels |
datim org levels |
orgs |
datim org hierarchy |
ims |
datim mechanisms |
hfr_data df
## Not run: update_operatingunits(hfr_df, levels=org_levels, orgs=org_hierarchy, ims=mechanisms) ## End(Not run)
## Not run: update_operatingunits(hfr_df, levels=org_levels, orgs=org_hierarchy, ims=mechanisms) ## End(Not run)
Push meta tables to Google Drive and s3
upload_meta_table(type = c("mech", "org"), folder = "out/DATIM")
upload_meta_table(type = c("mech", "org"), folder = "out/DATIM")
type |
table type, "mech" or "org" |
folder |
where is the file stored? default = "out/DATIM" |
## Not run: #pull updated mechanism table pull_mech(folderpath_output = savefolder) #upload to Google Drive and s3 upload_meta_table("mech") ## End(Not run)
## Not run: #pull updated mechanism table pull_mech(folderpath_output = savefolder) #upload to Google Drive and s3 upload_meta_table("mech") ## End(Not run)
Validate HFR PD Date
validate_date(df_pds, pdate, pd)
validate_date(df_pds, pdate, pd)
df_pds |
hfr period dates |
pdate |
period date |
pd |
reporting period |
valid: True / False
## Not run: validate_date(df_pds = valid_dates, '2020-01-27', 5) ## End(Not run)
## Not run: validate_date(df_pds = valid_dates, '2020-01-27', 5) ## End(Not run)
Validate processed hfr data
validate_hfr_data(hfr_data, orgs, ims, dates, keep_values = FALSE)
validate_hfr_data(hfr_data, orgs, ims, dates, keep_values = FALSE)
hfr_data |
processed hfr data |
orgs |
datim org hierarchy |
ims |
datim mechanisms |
dates |
hfr valid dates |
keep_values |
Keep values along the error flags |
errors data frame
## Not run: validate_hfr_data(df_hfr_data, orgs=org_hierarchy, ims=df_mechanisms, dates=df_hfr_dates) validate_hfr_data(df_hfr_data, orgs=org_hierarchy, ims=df_mechanisms, dates=df_hfr_dates, keep_values = TRUE) ## End(Not run)
## Not run: validate_hfr_data(df_hfr_data, orgs=org_hierarchy, ims=df_mechanisms, dates=df_hfr_dates) validate_hfr_data(df_hfr_data, orgs=org_hierarchy, ims=df_mechanisms, dates=df_hfr_dates, keep_values = TRUE) ## End(Not run)
Runs validation on data after reading in the dataset - making sure it has the required column based on the template type and check if there is one ore more operating units.
validate_import(df)
validate_import(df)
df |
df create during |
Other validation:
check_content()
,
check_dates()
,
check_disaggs()
,
check_orgunituids()
,
check_output_cols()
,
is_hfrtab()
,
validate_initial()
,
validate_output()
Performs basic check before importing data - checking if there are any HFR labeled tabs to import and then outputting info from the meta tab (country, filename, and template version) and then about the tabs (what will be imported or excluded)
validate_initial(filepath)
validate_initial(filepath)
filepath |
filepath to submitted template |
Other validation:
check_content()
,
check_dates()
,
check_disaggs()
,
check_orgunituids()
,
check_output_cols()
,
is_hfrtab()
,
validate_import()
,
validate_output()
Validate mechanism code
validate_mechanism(mechanisms, ou, mcode)
validate_mechanism(mechanisms, ou, mcode)
mechanisms |
df of mechs |
mech_code |
mech code |
vector c(valid_im, mech_name)
## Not run: validate_mechanism(ims, 'Angola', 16172) ## End(Not run)
## Not run: validate_mechanism(ims, 'Angola', 16172) ## End(Not run)
Validate org unit uid
validate_orgunit(df_orgs, ou, uid)
validate_orgunit(df_orgs, ou, uid)
df_orgs |
org hierarchy |
ou |
operating unit |
uid |
orgunituid |
valid as a vector c(valid_uid, valid_uid_ou)
## Not run: validate_orgunit(df_orgs, 'Eswatini', 'g48XD8px8NN') ## End(Not run)
## Not run: validate_orgunit(df_orgs, 'Eswatini', 'g48XD8px8NN') ## End(Not run)
Runs a number of validations after the tidying has occured. Additional,
optional validations against DATIM data can be run if data are available,
from pull_hierarchy
,pull_mech
, pull_mer
validate_output(df, output_path, datim_path = NULL)
validate_output(df, output_path, datim_path = NULL)
df |
HFR data framed created by |
datim_path |
path to look up files from |
Other validation:
check_content()
,
check_dates()
,
check_disaggs()
,
check_orgunituids()
,
check_output_cols()
,
is_hfrtab()
,
validate_import()
,
validate_initial()
Validate submitted files
validate_submission(pfile, levels, ims)
validate_submission(pfile, levels, ims)
levels |
org levels |
ims |
mechanisms df |
sfile |
processed file |
vector c("fy", "hfr_pd", "iso3", "operatingunit", "mech_code", "mech_valid", "mech_name", "pdate", "name")
## Not run: validate_submissions("HFR_2020.99_XWH_100000_processed_20200101.csv") ## End(Not run)
## Not run: validate_submissions("HFR_2020.99_XWH_100000_processed_20200101.csv") ## End(Not run)
Validate files
validate_submissions(pfolder, levels, ims, pattern = NULL)
validate_submissions(pfolder, levels, ims, pattern = NULL)
levels |
org levels |
ims |
mechanisms df |
pattern |
filename pattern |
folder |
pfolder |
dataframe
## Not run: validate_submissions(dir_hfr_pd205, pattern = "HFR_2020.05") ## End(Not run)
## Not run: validate_submissions(dir_hfr_pd205, pattern = "HFR_2020.05") ## End(Not run)
Check if variable exist
var_exists(df, var)
var_exists(df, var)
df |
data frame to check against |
var |
quoted variable of interest |
Other utility:
is_metatab()
,
package_check()
## Not run: var_exists(df, "val") ## End(Not run)
## Not run: var_exists(df, "val") ## End(Not run)