Title: | Utility functions related to working with the MER Structured Dataset |
---|---|
Description: | This packages contains a number of functions for working with the PEPFAR MSD. |
Authors: | Aaron Chafetz [aut, cre], Baboyma Kagniniwa [aut] |
Maintainer: | Aaron Chafetz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 4.1.6 |
Built: | 2024-11-14 23:30:43 UTC |
Source: | https://github.com/USAID-OHA-SI/gophr |
A tibble of the OHA Colors associated with different target achievement thresholds - 'At Risk', 'Concerned', 'On Target', and 'Above Target'. This table includes the current mapping as well as the original/classic one. These colors are used in association with 'adorn_achievement'.
data(achv_color_map)
data(achv_color_map)
A data frame with three variables and 4 rows
description of each achievement level
hex code of the associated color to the achievement level
classic/original hex code of the associated color to the achievement level
'adorn_achievement' calculate target achievement (cumulative and/or quarterly) for a standard MSD or one reshaped using 'reshape_msd'()' as well as to apply achievement group labels and colors. It will run 'calc_achievement' if an achievement column does not exist in the dataset.
adorn_achievement(df, qtr = NULL, classic = FALSE)
adorn_achievement(df, qtr = NULL, classic = FALSE)
df |
data frame as standrd MSD or one from reshape_msd() |
qtr |
if using standard MSD, need to provide the most recent quarter, ideally using identifypd(df_msd, pd_type = "quarter") |
classic |
use the original OHA achievement color palette (pre July 2024), default = FALSE |
data frame with achievement values, labels, and colors
Other achievement:
calc_achievement()
## Not run: df_msd <- read_msd(path) df_msd_agg <- df_msd %>% filter(operatingunit == "Jupiter" indicator %in% c("TX_NEW", "TX_CURR"), funding_agency != "Dedup", standardizeddisaggregate == "Total Numerator") %>% group_by(operatingunit, funding_agency, fiscal_year, indicator) %>% summarise(across(where(is.double), sum, na.rm = TRUE)) %>% ungroup() adorn_achievement(df_msd_agg) df_msd_agg %>% reshape_msd("quarters") %>% adorn_achievement() df_msd_agg %>% reshape_msd("quarters", qtrs_keep_cumulative = TRUE) %>% adorn_achievement() ## End(Not run)
## Not run: df_msd <- read_msd(path) df_msd_agg <- df_msd %>% filter(operatingunit == "Jupiter" indicator %in% c("TX_NEW", "TX_CURR"), funding_agency != "Dedup", standardizeddisaggregate == "Total Numerator") %>% group_by(operatingunit, funding_agency, fiscal_year, indicator) %>% summarise(across(where(is.double), sum, na.rm = TRUE)) %>% ungroup() adorn_achievement(df_msd_agg) df_msd_agg %>% reshape_msd("quarters") %>% adorn_achievement() df_msd_agg %>% reshape_msd("quarters", qtrs_keep_cumulative = TRUE) %>% adorn_achievement() ## End(Not run)
When working with financial or HRH data, its often useful to determine whether funding or staffing are counted as "service delivery" or "non-service delivery"
apply_funding_type(df)
apply_funding_type(df)
df |
Financial Structured Dataset data frame |
a new column, funding_type
'apply_partner_type' pulls from USAID managed Google Sheet maintained by the Local Partner team that designates all USAID mechanisms by their type, local or international. The designations include the updates for regional partners, which in FY21 are classified as local. These adjustments can be found on the adjusted column.
apply_partner_type(df)
apply_partner_type(df)
df |
data frame, MER or financial dataset |
a df with the partner types provided by USAID.
Launches the USAID managed, Known Data Issues Tracker.
The Known Data Issues Tracker is a table that summarizes different issues that are known in DATIM but cannot be resolved. For example, when a treatment partner start/ends mid-year, there will be duplicated targets that cannot be adjusted in DATIM.
The list of known issues is maintained by USAID/SIEI division. To access the table requires having a USAID email account.
browse_knownissues()
browse_knownissues()
Launches The Known Data Issues Tracker Google Sheet.
[resolve_knownissues()] to exclude Known Issues from data set; [set_email()] to store USAID email; [load_secrets()] to load credentials into session.
## Not run: load_secrets() browse_knownissues() ## End(Not run)
## Not run: load_secrets() browse_knownissues() ## End(Not run)
'calc_achievement' creates a target achievement column standard to many PEPFAR analyses. It can calculate achievement from a normal MSD or one reshaped using 'reshape_msd'.
calc_achievement(df)
calc_achievement(df)
df |
MSD based dataframe |
one or two additional columns that calculate achievement and/or quarterly achievement
[reshape_msd()]
Other achievement:
adorn_achievement()
## Not run: df_msd <- read_msd(path) df_msd_agg <- df_msd %>% filter(operatingunit == "Jupiter" indicator %in% c("TX_NEW", "TX_CURR"), funding_agency != "Dedup", standardizeddisaggregate == "Total Numerator") %>% group_by(operatingunit, funding_agency, fiscal_year, indicator) %>% summarise(across(where(is.double), sum, na.rm = TRUE)) %>% ungroup() calc_achievement(df_msd_agg) df_msd_agg %>% reshape_msd("quarters") %>% calc_achievement() ## End(Not run)
## Not run: df_msd <- read_msd(path) df_msd_agg <- df_msd %>% filter(operatingunit == "Jupiter" indicator %in% c("TX_NEW", "TX_CURR"), funding_agency != "Dedup", standardizeddisaggregate == "Total Numerator") %>% group_by(operatingunit, funding_agency, fiscal_year, indicator) %>% summarise(across(where(is.double), sum, na.rm = TRUE)) %>% ungroup() calc_achievement(df_msd_agg) df_msd_agg %>% reshape_msd("quarters") %>% calc_achievement() ## End(Not run)
List of indicators in the MER that comprise the clinical cascade. These indicators are needed for calculating the 90s/95s plus linkage and viral load coverage.
data(cascade_ind)
data(cascade_ind)
A list of clinical cascade indicators
indicator names
This function converts all funding agency names to upper case removes the HHS prefix for those agencies and moves State and USAID subsidiaries under their parent agencies
clean_agency(.data)
clean_agency(.data)
.data |
MSD Datasets |
Cleaned MSD DataFrame
Other column munging:
clean_column()
,
clean_indicator()
,
clean_psnu()
## Not run: df_msd %>% clean_agency() ## End(Not run)
## Not run: df_msd %>% clean_agency() ## End(Not run)
Clean column data
clean_column(.data, colname = "psnu")
clean_column(.data, colname = "psnu")
.data |
MSD Datasets |
colname |
Name of the column(s) |
Cleaned MSD DataFrame
Other column munging:
clean_agency()
,
clean_indicator()
,
clean_psnu()
## Not run: df_msd %>% clean_column(colname = "psnu") ## End(Not run)
## Not run: df_msd %>% clean_column(colname = "psnu") ## End(Not run)
‘clean_indicator' applies a ’_D' suffix to any indicators that are a denominator. This is particularly useful when aggregating data or reshaping.
clean_indicator(df)
clean_indicator(df)
df |
MSD data frame |
indicators with denominator have _D suffix
Other column munging:
clean_agency()
,
clean_column()
,
clean_psnu()
## Not run: df <- df %>% filter(indicator == "TX_PVLS", standardizeddisaggregate %in% c("Total Numerator", "Total Denominator")) %>% clean_indicator() ## End(Not run)
## Not run: df <- df %>% filter(indicator == "TX_PVLS", standardizeddisaggregate %in% c("Total Numerator", "Total Denominator")) %>% clean_indicator() ## End(Not run)
Clean PSNU column data
clean_psnu(.data)
clean_psnu(.data)
.data |
MSD Datasets |
Cleaned MSD DataFrame
Other column munging:
clean_agency()
,
clean_column()
,
clean_indicator()
## Not run: df_msd %>% clean_psnu() ## End(Not run)
## Not run: df_msd %>% clean_psnu() ## End(Not run)
This function is used to extract meta data as a list from the source file of a MER Structured Dataset, MER NAT_SUBNAT Structured Dataset, Financial Structure Dataset, HRH Structured Dataset, or DATIM Genie export. It creates a list object, metadata, in the global environment containing the source, current fiscal year, current period, current quarter, as well as a caption.
get_metadata(path, type, caption_note)
get_metadata(path, type, caption_note)
path |
path to the folder containing MSDs or specific MSD file (default relies on glamr::si_path() if available) |
type |
PSD type: "OU_IM", PSNU_IM", "NAT_SUBNAT", "Financial"; default = "OU_IM_FY2*" |
caption_note |
additional information to include in a viz caption footer |
list of meta data information about the source dataset
Other metadata:
extract_metadata()
,
source_info()
## Not run: meta <- get_metadata() #works if you have stored path to the MSD folder via glamr::set_paths() meta$curr_fy ## End(Not run) ## Not run: library(tidyverse) library(glamr) library(gophr) library(glue) ref_id <- "1bdf4c4e" meta <- get_metadata(caption_note = "Created by: The Dream Team") cntry <- "Saturn" df <- si_path() %>% return_latest("OU_IM") %>% read_msd() df_viz <- df %>% filter(operatingunit == cntry, fiscal_year == meta$curr_fy, indicator == "TX_NEW", standardizeddisaggregate == "Total Numerator") df_viz <- df_viz %>% group_by(fiscal_year, indicator, mech_code) %>% summarise(across(c(targets, starts_with("qtr")), sum, na.rm = TRUE), .groups = "drop") df_viz <- reshape_msd(df_viz, "quarters") df_viz %>% ggplot(aes(period, results_cumulative)) + geom_col() + geom_text(data = . %>% filter(., period == meta$curr_pd), aes(label = results_cumulative), vjust = -.5) + facet_wrap(~fct_reorder2(mech_code, period, targets)) + labs(title = glue("Upward trend in TX_NEW results thru {meta$curr_qtr} quarters") %>% toupper, subtitle = glue("{cntry} | {meta$curr_fy_lab} cumulative mechanism results"), x = NULL, y = NULL, caption = glue("{meta$caption}")) ## End(Not run)
## Not run: meta <- get_metadata() #works if you have stored path to the MSD folder via glamr::set_paths() meta$curr_fy ## End(Not run) ## Not run: library(tidyverse) library(glamr) library(gophr) library(glue) ref_id <- "1bdf4c4e" meta <- get_metadata(caption_note = "Created by: The Dream Team") cntry <- "Saturn" df <- si_path() %>% return_latest("OU_IM") %>% read_msd() df_viz <- df %>% filter(operatingunit == cntry, fiscal_year == meta$curr_fy, indicator == "TX_NEW", standardizeddisaggregate == "Total Numerator") df_viz <- df_viz %>% group_by(fiscal_year, indicator, mech_code) %>% summarise(across(c(targets, starts_with("qtr")), sum, na.rm = TRUE), .groups = "drop") df_viz <- reshape_msd(df_viz, "quarters") df_viz %>% ggplot(aes(period, results_cumulative)) + geom_col() + geom_text(data = . %>% filter(., period == meta$curr_pd), aes(label = results_cumulative), vjust = -.5) + facet_wrap(~fct_reorder2(mech_code, period, targets)) + labs(title = glue("Upward trend in TX_NEW results thru {meta$curr_qtr} quarters") %>% toupper, subtitle = glue("{cntry} | {meta$curr_fy_lab} cumulative mechanism results"), x = NULL, y = NULL, caption = glue("{meta$caption}")) ## End(Not run)
This function is useful as a utility function in this and other packages to determine the type of file that is read in and determine certain types of munging handling.
identify_psd(df)
identify_psd(df)
df |
PSD dataframe |
character, PSD data stream and the level
## Not run: #read in file for use path <- "~/Data/ICPI_MER_Structured_Dataset_PSNU_20180323_v2_1.txt" df <- read_psd(path) #identify data stream identify_psd(df_psnu) ## End(Not run)
## Not run: #read in file for use path <- "~/Data/ICPI_MER_Structured_Dataset_PSNU_20180323_v2_1.txt" df <- read_psd(path) #identify data stream identify_psd(df_psnu) ## End(Not run)
'identifypd' uses the information from the raw MSD structure to identify the current reporting period (fiscal year and/or quarter). This function can be used to make other inputs in your code more dynamic. Originally, 'identifypd' was developed for use in achafetz/PartnerProgress
identifypd(df, pd_type = "full", pd_prior = FALSE)
identifypd(df, pd_type = "full", pd_prior = FALSE)
df |
dataset to use to find latest period |
pd_type |
what is returned? (a) full, eg fy2018q1; (b)year, eg 2018; (c) quarter, eg 1 |
pd_prior |
do you want the last period returned (instead of the current); default = FALSE |
## Not run: #return full, current period, eg "fy2018q3" identifypd(df_mer) #return the current quarter, eg 3 identifypd(df_mer, "quarter") #return the prior quarter, eg "fy2018q2" identifypd(df_mer, pd_prior = TRUE) ## End(Not run)
## Not run: #return full, current period, eg "fy2018q3" identifypd(df_mer) #return the current quarter, eg 3 identifypd(df_mer, "quarter") #return the prior quarter, eg "fy2018q2" identifypd(df_mer, pd_prior = TRUE) ## End(Not run)
Often times we want just want to work with totals when performing an analysis or visualization. This function is a simple filter for Total Numerator and Denominator in a MSD dataframe
pluck_totals(df)
pluck_totals(df)
df |
MER Structured Dataset (MSD) dataframe |
MSD with only total numerator and denominator
df_msd <- read_msd(path) df_totals <- df_msd %>% pluck_totals() %>% clean_indicator() %>% filter(indicator %in% cascade_ind)
df_msd <- read_msd(path) df_totals <- df_msd %>% pluck_totals() %>% clean_indicator() %>% filter(indicator %in% cascade_ind)
Deprecated. Use 'read_psd' instead.
read_msd( file, save_rds = FALSE, remove_txt = FALSE, convert_to_old_names = FALSE )
read_msd( file, save_rds = FALSE, remove_txt = FALSE, convert_to_old_names = FALSE )
file |
enter the full path to the PEPFAR structured dataset file |
save_rds |
save the Structured Dataset as an rds file, default = FALSE |
remove_txt |
should the txt file be removed, default = FALSE |
convert_to_old_names |
replace FY22Q2 naming convention with old? default = FALSE |
'read_psd' imports a stored PEPFAR Structured Datasets (.zip, .txt, or .parquet). The function will read in a MSD, Genie, Financial, HRH, or DHI PEPFAR dataset, ensuring the column types are correct. The user has the ability to store the txt file as a rds or parquet file, significantly saving storage space on the computer (and can then remove the txt file after importing). Most of USAID/OHA processes and analyses rely on the use of the MSD file being read in via 'read_psd'. This function can be used in the PDAP space in addition to working locally.
read_psd( file, export_format = "none", remove_base_file = FALSE, retain_genie_cols = FALSE )
read_psd( file, export_format = "none", remove_base_file = FALSE, retain_genie_cols = FALSE )
file |
enter the full path to the PEPFAR structured dataset file |
export_format |
if desired, save the PSD in another compressed format, either "rds" or "parquet", default = "none" |
remove_base_file |
should original base file be removed if exporting in another compressed format? default = FALSE |
retain_genie_cols |
should Genie specific columns ('dataelementuid', 'categoryoptioncombouid', 'approvallevel', 'approvalleveldescription') be retained in the output dataset? default = FALSE |
## Not run: #convert Q1 clean PSNU file from txt to Rds #read in file for use path <- "~/Data/MER_Structured_Datasets_OU_IM_FY22-24_20240315_v2_1.zip" df_psnu <- read_psd(path) ## End(Not run)
## Not run: #convert Q1 clean PSNU file from txt to Rds #read in file for use path <- "~/Data/MER_Structured_Datasets_OU_IM_FY22-24_20240315_v2_1.zip" df_psnu <- read_psd(path) ## End(Not run)
Central Support is often desired to be removed from MER analysis. As of FY21Q4, "CS" is a field under 'indicatortype', which can be used to identify and exclude this sort of reporting. This function is run by default in 'resolve_knownissues', but can be run separately if desired.
remove_centralsupport(df)
remove_centralsupport(df)
df |
MSD dataframe (must include indicatortype) |
df with central support reporting removed
When working with financial data, its often useful to remove M&O funding from the data.
remove_mo(df)
remove_mo(df)
df |
financial or comprehensive budget dataset |
a dataset excluding M&O
When working with financial data, its often useful to remove Supply Chain mechanism and funding from the data.
remove_sch(df, poc = c("SCH", "SGAC"), flag_only = FALSE)
remove_sch(df, poc = c("SCH", "SGAC"), flag_only = FALSE)
df |
this can be either a financial structured dataset or an MSD |
poc |
you can choose to filter either the SCH list or the SGAC list; default is both |
flag_only |
allows you keep full dataset, but identify the mechanisms that are supply chain, mech_sch as a logical; default = FALSE |
The list of SCH mechanisms is maintained by the EA team on a Google Sheet. A USAID email is required to access the dataset.
a df without supply chain mechanisms
[set_email()] to store USAID email; [load_secrets()] to load credentials into session
## Not run: #authenticate load_secrets() #remove SCh using SGAC list df <- remove_shc(df, poc = "SGAC") ## End(Not run)
## Not run: #authenticate load_secrets() #remove SCh using SGAC list df <- remove_shc(df, poc = "SGAC") ## End(Not run)
Some mechanisms and partners are recorded in FACTSInfo with multiple names over different time period. This function replaces all partner and mechanism names the most recent name for each mechanism ID pulling from a DATIM SQL View. The 'mech_code' variable is required in your dataset, and having 'operatingunit' and 'fiscal_year' or 'period' are highly recommended as they will limit the size of the DATIM queries.With an DHIS2 update to DATIM in 2021, the DATIM mechanism tables requires a password to access. We would recommend using 'glamr::set_datim()' to store your DATIM credentials securely on your local machine. If you don't have them stored, you will be prompted each time to enter your password to acces DATIM.
rename_official(df, datim_user, datim_pwd)
rename_official(df, datim_user, datim_pwd)
df |
identify the PEPFAR Structured Data Set to clean |
datim_user |
DATIM username; if missing will look for stored credentials first and then prompt for them if not found |
datim_pwd |
DATIM password; if missing will look for stored credentials first and then prompt for them if not found |
## Not run: df_psnu_im <- read_psd(file) %>% filter(country == "Saturn") df_psnu_im <- rename_official(df_psnu_im) ## End(Not run)
## Not run: df_psnu_im <- read_psd(file) %>% filter(country == "Saturn") df_psnu_im <- rename_official(df_psnu_im) ## End(Not run)
'reshape_msd' transforms the structure into a tidy format. The default reshape, makes the whole dataset long by period, but other options include wide to match the original MSD, semi-wide for a separate column for targets, cumulative, and results, and then a quarterly one which keeps the targets (and cumulative) in their own columsn but makes the quarters long. Reshaping tidy is key for much of OHA/SI processes and analysis.
reshape_msd( df, direction = c("long", "wide", "semi-wide", "quarters"), include_type = TRUE, qtrs_keep_cumulative = FALSE )
reshape_msd( df, direction = c("long", "wide", "semi-wide", "quarters"), include_type = TRUE, qtrs_keep_cumulative = FALSE )
df |
MSD dataset in the semi-wide format |
direction |
direction of reshape, "long" (default), "wide" (original MSD structure), "semi-wide" (one column for targets, cumulative, results) or "quarters" (quarters pivoted, but not targets - useful for quarterly achievement)). |
include_type |
whether a period_type column (targets, results, cumulative) should be included, default = TRUE |
qtrs_keep_cumulative |
whether to keep the cumulative column when using quarters for direction, default = FALSE |
## Not run: #read in data df_genie <- match_msd("~/Downloads/PEPFAR-Data-Genie-PSNUByIMs-2018-08-15.zip") #reshape long df_genie_long <- reshape_msd(df_genie) #reshape wide (to look like old format) df_genie_long <- reshape_msd(df_genie, direction = "wide") #reshape semi-wide (one column for targets, cumulative, results) df_genie_wide <- reshape_msd(df_genie, direction = "semi-wide") #reshape quarters (quarters pivoted, but not targets - useful for quarterly achievement) df_genie_wide <- reshape_msd(df_genie, direction = "semi-wide") ## End(Not run)
## Not run: #read in data df_genie <- match_msd("~/Downloads/PEPFAR-Data-Genie-PSNUByIMs-2018-08-15.zip") #reshape long df_genie_long <- reshape_msd(df_genie) #reshape wide (to look like old format) df_genie_long <- reshape_msd(df_genie, direction = "wide") #reshape semi-wide (one column for targets, cumulative, results) df_genie_wide <- reshape_msd(df_genie, direction = "semi-wide") #reshape quarters (quarters pivoted, but not targets - useful for quarterly achievement) df_genie_wide <- reshape_msd(df_genie, direction = "semi-wide") ## End(Not run)
This function handles known issues in the MSD. For example, a mechanism starting mid-year and reporting on TX_CURR will duplicate TX_CURR targets. This function resolves that by removing those known cases and stores cases relevant to your data frame.
The best workflow is to filter your dataset down to a country and/or technical area of interest before running 'resolve_knownissues()'. When you run the function, it will print out any known issues to the console (and can) even store them to your Global Environment, so it makes sense to limit the data first to what you are using/care about.
The list of known issues is maintained by USAID/SIEI division. To access the table requires having a USAID email account and can be accessed via 'browse_knownissues()'.
resolve_knownissues(df, remove_cs = TRUE, store_excl = FALSE)
resolve_knownissues(df, remove_cs = TRUE, store_excl = FALSE)
df |
standard MSD data frame, typically after its been filtered |
remove_cs |
remove data flagged as central support (CS), default = TRUE |
store_excl |
should the known exclusions be store in the Global Envir? |
df excluding known targets/results issues
[browse_knownissues()] to view table in Google Sheets; [set_email()] to store USAID email; [load_secrets()] to load credentials into session.
## Not run: library(tidyverse) library(glamr) load_secrets() # or googlesheets4::gs4_auth() df_msd <- si_path() %>% return_latest("OU_IM") %>% read_rds() df_mwi <- df_msd %>% filter(operatingunit == "Malawi", indicator == "TX_CURR") df_mwi_resolved <- df_mwi %>% resolve_knownissues() ## End(Not run)
## Not run: library(tidyverse) library(glamr) load_secrets() # or googlesheets4::gs4_auth() df_msd <- si_path() %>% return_latest("OU_IM") %>% read_rds() df_mwi <- df_msd %>% filter(operatingunit == "Malawi", indicator == "TX_CURR") df_mwi_resolved <- df_mwi %>% resolve_knownissues() ## End(Not run)
List of indicators in the MER that are snapshot indicators, eg TX_CURR, as opposed to a cumulative one, like HTS_TST. Snapshot indicators are handled differently for calculating cumulative values and target achievement.
data(snapshot_ind)
data(snapshot_ind)
A list of all snapshot indicators
indicator names
This function is used primarily to extract the data from the source file of a MER Structured Dataset, MER NAT_SUBNAT Structured Dataset, Financial Structure Dataset, or DATIM Genie export. It can also be used to extact information from the filename about the fiscal year, quarter, or period.
source_info(path, type, return = "source")
source_info(path, type, return = "source")
path |
path to the folder containing MSDs or specific MSD file |
type |
not required unless providing a folder in 'path'; default = "OU_IM_FY21"; other examples include: "PSNU_IM", "NAT_SUBNAT", "PSNU", "Financial", "HRH" |
return |
from the info, what should be returned; default = "source" other options are: "period", "fiscal_year", "fiscal_year_label","quarter" |
vector of information related to what is being asked in 'return'
Other metadata:
extract_metadata()
,
get_metadata()
## Not run: source_info() #works if you have stored path to the MSD folder via glamr::set_paths() source_info("../Data", type = "PSNUxIM") source_info("../Data", type = "PSNUxIM", return = "period") source_info("../Downloads/Genie_PSNU_IM_Jupiter_Daily_c9f5889f-86c9-44e7-ab63-fa86c587d251.zip") source_info("../Data/MER_Structured_Datasets_NAT_SUBNAT_FY15-21_20210618_v2_1.rds") ## End(Not run) ## Not run: library(tidyverse) library(glamr) library(glue) df <- si_path() %>% return_latest("OU_IM") %>% read_msd() df_viz <- df %>% filter(operatingunit == "Saturn", indicator == "TX_NEW", standardizeddisaggregate == "Total Numerator") %>% count(fiscal_year, wt = targets, name = "targets") df_viz %>% ggplot(aes(fiscal_year, targets)) + geom_col() + labs(caption = glue("Source: {source_info()}")) ## End(Not run)
## Not run: source_info() #works if you have stored path to the MSD folder via glamr::set_paths() source_info("../Data", type = "PSNUxIM") source_info("../Data", type = "PSNUxIM", return = "period") source_info("../Downloads/Genie_PSNU_IM_Jupiter_Daily_c9f5889f-86c9-44e7-ab63-fa86c587d251.zip") source_info("../Data/MER_Structured_Datasets_NAT_SUBNAT_FY15-21_20210618_v2_1.rds") ## End(Not run) ## Not run: library(tidyverse) library(glamr) library(glue) df <- si_path() %>% return_latest("OU_IM") %>% read_msd() df_viz <- df %>% filter(operatingunit == "Saturn", indicator == "TX_NEW", standardizeddisaggregate == "Total Numerator") %>% count(fiscal_year, wt = targets, name = "targets") df_viz %>% ggplot(aes(fiscal_year, targets)) + geom_col() + labs(caption = glue("Source: {source_info()}")) ## End(Not run)
'split_save' breaks a dataset by the groups provided and then exports those individual frames as separate csv files. This process can be useful when working with data across multiple partners and then creating datasets to be sent to each of them with their own data.
split_save(df, group_var, folderpath, filename_stub, include_date = FALSE)
split_save(df, group_var, folderpath, filename_stub, include_date = FALSE)
df |
dataframe to split |
group_var |
grouping variable to split the dataset by, eg operatingunit, funding_agency |
folderpath |
directory where you want to store the files |
filename_stub |
generic stub for naming all the files |
include_date |
include date after filenamestub? default = FALSE, eg "20180913" |
## Not run: #create country specific files for TX_NEW df_mer %>% filter(indicator == "TX_NEW", standardizeddisaggregate == "Total Numerator") %>% split_save(operatingunit, "~/CountryFiles", "FY18Q3_TX") ## End(Not run)
## Not run: #create country specific files for TX_NEW df_mer %>% filter(indicator == "TX_NEW", standardizeddisaggregate == "Total Numerator") %>% split_save(operatingunit, "~/CountryFiles", "FY18Q3_TX") ## End(Not run)