Title: | Import targets and PLHIV data from COP Target Setting Tool (formerly Data Pack) |
---|---|
Description: | Import PSNUxIM targets and PLHIV data from COP Data Pack. The purpose is to make the data tidy and more usable than their current structure in the Excel data packs. |
Authors: | Aaron Chafetz [aut, cre], Karishma Srikanth [aut], Baboyma Kagniniwa [ctb], Josh Davis [ctb] |
Maintainer: | Aaron Chafetz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 6.2.4 |
Built: | 2024-11-22 04:38:38 UTC |
Source: | https://github.com/USAID-OHA-SI/tameDP |
A dataframe containing the age bands in the MSD and the collapsed age bands in the COP23 Target Setting TOol
age_band_crosswalk
age_band_crosswalk
A data frame with 18 rows and 2 variables:
Age bands in the MER structured Dataset
collapsed age bands in the TST
To ensure there are no duplicate rows on the reshape, this function first aggregates the data by the key columns to minimize any issues. If desiring to work at the PSNU level, the parameter 'psnu_lvl' allows you to aggregate to the PSNU level instead of the PSNUxIM level.
agg_dp(df, psnu_lvl = FALSE)
agg_dp(df, psnu_lvl = FALSE)
df |
data frame to aggregate |
psnu_lvl |
default aggregate is to IM level; if TRUE, aggregates to PSNU level |
This function pulls in a PSNUxIM MSD and datapack filepath to align the MSD extract to the indicators and disaggregates in the datapack, as well as historic results and targets. This function also addresses when OUs set targets at a higher level than PSNU for alignment.
align_msd_disagg(msd_path)
align_msd_disagg(msd_path)
msd_path |
path to PSNUxIM extract |
## Not run: df_msd <- align_msd_disagg(msd_path = msd_path) ## End(Not run)
## Not run: df_msd <- align_msd_disagg(msd_path = msd_path) ## End(Not run)
Ensure that fiscal year, cumulative, and targets are numeric and all other variables are stored as characters.
apply_class(df)
apply_class(df)
df |
dataframe output to reorder |
Apply fiscal year to each row, using the T or T_1 or R in 'indicator_code' to determine whether it's the current or a prior fiscal year. The fiscal year can be identified dynamically through 'grab_info()'.
apply_fy(df, year)
apply_fy(df, year)
df |
DP dataframe to apply fiscal year to |
year |
fiscal year, derived from 'grab_info(filepath, "year")' |
data frame with fiscal year
Join the new COP prioritization onto the target data frame.
apply_prioritization(df, df_prioritization)
apply_prioritization(df, df_prioritization)
df |
Target Setting Tool data frame |
df_prioritization |
dataframe from 'grab_prioritization()' |
Other prioritization:
grab_prioritization()
Join the SNU1 onto the PSNUxIM data frame.
apply_snu1(df, df_snu1)
apply_snu1(df, df_snu1)
df |
Target Setting Tool data frame |
df_snu1 |
dataframe from 'grab_snu1()' |
Other snu1:
grab_snu1()
This function applies metadata from the source file to the tidied dataset including the file name, last modified date, and
apply_stamps(df, filepath)
apply_stamps(df, filepath)
df |
data frame read in and reshaped by import_dp and reshape_dp |
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
new columns in df with source information
The indicator and disaggregates used in the Target Setting Tool skew towards machine readable and do not necessary match the MER indicators in the MSD/DATIM. This function makes adjustments to indicators and disaggregates to make them easier to work with and more closely align to the MSD. This function also uses 'convert_mods()', which creates the testing modalities that match the MSD and create new HTS_TST and HTS_TST_POS indicator from indicator that feed into them (eg HTS_INDEX, TB_STAT, PMTCT_STAT, VMMC_CIRC).
clean_indicators(df, fy)
clean_indicators(df, fy)
df |
data frame to adjust |
fy |
fiscal year for targeting |
This function matches the testing modalities from the MSD and create new HTS_TST and HTS_TST_POS indicator from indicator that feed into them (eg HTS_INDEX, TB_STAT, PMTCT_STAT, VMMC_CIRC).
convert_mods(df)
convert_mods(df)
df |
data frame |
The Target Setting Tool does not contain information on the mechanism (names or partners). By running this function, you are connecting to DATIM's SQLView file that contains the list of all current mechanisms. This requires providing your DATIM credentials. If left blank in the function, you will have two dialogue boxes popping up asking for your DATIM username and password. If running ‘tame_dp()' across multiple Target Setting Tools, it’s advisable to run 'get_names()' on the file dataset produced by 'tame_dp'.
get_names( df, map_names = TRUE, psnu_lvl = FALSE, cntry, datim_user, datim_password )
get_names( df, map_names = TRUE, psnu_lvl = FALSE, cntry, datim_user, datim_password )
df |
data frame to add mechanism info to |
map_names |
import names from DATIM (OU, mechanism, partner) associated with mech_code |
psnu_lvl |
aggregate to the PSNU level instead of IM |
cntry |
country, from grab_info() if not connecting to DATIM |
datim_user |
DATIM user name (if not provided, you will be prompted with a pop up) |
datim_password |
DATIM password (if not provided, you will be prompted with a pop up) |
#load package library(purrr) #identify all the Target Setting Tool files files <- list.files("../Downloads/DataPacks", full.names = TRUE) #read in all DPs and combine into one data frame df_all <- map_dfr(.x = files, .f = ~ tame_dp(.x, map_names = FALSE)) #apply mech_name and primepartner names from DATIM #you will need to provide your DATIM credentials datim_user_nm <- "spower" #replace with your username datim_pwd <- getPass::getPass() #pop up prompting for your password df_all <- get_names(df_all, datim_user = datim_user_nm, datim_password = datim_pwd)
#load package library(purrr) #identify all the Target Setting Tool files files <- list.files("../Downloads/DataPacks", full.names = TRUE) #read in all DPs and combine into one data frame df_all <- map_dfr(.x = files, .f = ~ tame_dp(.x, map_names = FALSE)) #apply mech_name and primepartner names from DATIM #you will need to provide your DATIM credentials datim_user_nm <- "spower" #replace with your username datim_pwd <- getPass::getPass() #pop up prompting for your password df_all <- get_names(df_all, datim_user = datim_user_nm, datim_password = datim_pwd)
This function extract information stored in the Target Setting Tool Home tab to identify either the country or what the fiscal year is.
grab_info(filepath, type)
grab_info(filepath, type)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
type |
either "country" or "year" |
path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" cntry <- grab_info(path, "country") fy <- grab_info(path, "year")
path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" cntry <- grab_info(path, "country") fy <- grab_info(path, "year")
Pull from the prioritization tab to have a table of PSNU prioritization for the current COP.
grab_prioritization(filepath)
grab_prioritization(filepath)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
dataframe from the Prioritization tab
Other prioritization:
apply_prioritization()
Pull SNU1 from the prioritization tab to have a table to align/apply with the PSNUxIM tab
grab_snu1(filepath)
grab_snu1(filepath)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
dataframe from the Prioritization tab
Other snu1:
apply_snu1()
Initial reading in of tabs of the Target Setting Tool. This function reads in the necessary tab or tabs, removes unused columns and cleans up the column names so there are no duplicates. For the PSNUxIM, it identified columns as as a share or value.
import_dp(filepath, tab)
import_dp(filepath, tab)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
tab |
which sheet to read in |
path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" df_tst <- import_dp(path, tab = "PSNUxIM")
path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" df_tst <- import_dp(path, tab = "PSNUxIM")
Is the filepath correct for the Target Setting Tool
is_file(filepath)
is_file(filepath)
filepath |
filepath of Target Setting Tool |
Other validation:
is_sheet()
,
is_xls()
,
no_connection()
Check if a sheet exits in Target Setting Tool
is_sheet(filepath, tab = "PSNUxIM")
is_sheet(filepath, tab = "PSNUxIM")
filepath |
filepath of Target Setting Tool |
tab |
sheet to check in Target Setting Tool, "PSNUxIM" (default) |
Other validation:
is_file()
,
is_xls()
,
no_connection()
Check if the filepath is .xls or .xlsx
is_xls(filepath)
is_xls(filepath)
filepath |
filepath of COP Target Setting Tool |
Other validation:
is_file()
,
is_sheet()
,
no_connection()
Deprecated! See 'tame_join'.
join_dp_msd(dp_filepath, msd_filepath, fy_as_str = TRUE, map_names = FALSE)
join_dp_msd(dp_filepath, msd_filepath, fy_as_str = TRUE, map_names = FALSE)
dp_filepath |
file path to the Target Setting Tool importing, must be .xlsx |
msd_filepath |
filepath to the latest PSNUxIM MSD for corresponding OU |
fy_as_str |
should FY be converted to a string (2025 > FY25) for Tableau? (default = TRUE) |
map_names |
import names from DATIM (OU, mechanism, partner) associated with mech_code when working with PSNUxIM (default = FALSE) |
dataframe that combines targets from the TST with corresponding historic results/targets from MSD
#DP file path tst_path <- "../Downloads/DataPack_Jupiter_20500101.xlsx" # MSD filepath msd_path <- "../Data/MER_Structured_TRAINING_Datasets_PSNU_IM_FY59-61_20240215_v1_1.zip" #run join function (depricated) df_join <- join_dp_msd(tst_path, msd_path)
#DP file path tst_path <- "../Downloads/DataPack_Jupiter_20500101.xlsx" # MSD filepath msd_path <- "../Data/MER_Structured_TRAINING_Datasets_PSNU_IM_FY59-61_20240215_v1_1.zip" #run join function (depricated) df_join <- join_dp_msd(tst_path, msd_path)
This function limits the output of the Target Setting Tool data to either MER or SUBNAT (e.g. PLHIV, TX_CURR_SUBNAT) data. It will not be run if processing the PSNUxIM tab since that does not include any SUBNAT data.
limit_datatype(df, type)
limit_datatype(df, type)
df |
data frame read in and reshaped by import_dp and reshape_dp |
type |
dataset type, either "MER" or "PLHIV" |
data frame limited to either MER or SUBNAT data
To align with DATIM datasets, the standardized disaggregates for each indicators will be aligned to the Target Setting Tool for FY22 Targets.
map_disaggs(df)
map_disaggs(df)
df |
dataframe from clean_indicators |
This function utilizes the meta data stored in row 6 of each tab of the Data Pack to determine what column type is - "assumption", "calculation", "past", "result", "reference", "row_header", "target". The primary columns we want are meta data (row_header), targets, and past (prior year result/targets for reference).
match_col_type(filepath, tab, pattern = "(row_header|target|past)")
match_col_type(filepath, tab, pattern = "(row_header|target|past)")
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
tab |
which sheet to read in |
pattern |
type of column, "assumption", "calculation", "past", "result", "reference", "row_header", "target"; default = "(row_header|target|past)" |
Boolean list of matches
A dataset containing the mapping between MER/SUBNAT/IMPATT indicators from the Target Setting Tool and their official disaggregates in DATIM from FY23/ COP22 targets.
mer_disagg_mapping
mer_disagg_mapping
A data frame with 60 rows and 4 variables:
MER indicator name
designates whether the indicator type
indicator disaggregation, eg Age/Sex/HIVStatus
whether the disaggregation is for Key Populations
A dataset containing the mapping between MER/SUBNAT/IMPATT indicators from the Target Setting Tool and their official disaggregates in DATIM from FY24/ COP23 targets, as well as historic results/targets disaggregates from DATIM from FY22-FY24.
mer_historic_disagg_mapping_2024
mer_historic_disagg_mapping_2024
A data frame with 225 rows and 5 variables:
MER indicator name
designates whether the indicator type
indicator disaggregation, eg Age/Sex/HIVStatus
fiscal year
whether the disaggregation is for Key Populations
A dataset containing the mapping between MER/SUBNAT/IMPATT indicators from the Target Setting Tool and their official disaggregates in DATIM from FY23/ COP22 targets, as well as historic results/targets disaggregates from DATIM from FY21-FY23.
msd_historic_disagg_mapping
msd_historic_disagg_mapping
A data frame with 163 rows and 5 variables:
MER indicator name
designates whether the indicator type
indicator disaggregation, eg Age/Sex/HIVStatus
fiscal year
whether the disaggregation is for Key Populations
Check if computer has internet connection
no_connection()
no_connection()
Other validation:
is_file()
,
is_sheet()
,
is_xls()
Ensure variables in the exported data frame are correctly ordered.
order_vars(df)
order_vars(df)
df |
dataframe output to reorder |
A dataset containing the mapping countries and operating units. Most countries are also Operating Units, expect for those in regional programs.
ou_ctry_mapping
ou_ctry_mapping
A data frame with 60 rows and 2 variables:
PEPFAR Operating Unit (countries + 3 regional programs)
PEPFAR Country Name
https://final.datim.org/api/organisationUnits
If there are any historic results in the dataset (found in some of the non- PSNUxIM tabs), we want to separate these from the target values to ensure the dataset is tidy and results/targets will not be indavertently aggregated. The reshape will create a cumulative column if results exist in the provided dataframe.
pivot_results(df)
pivot_results(df)
df |
data frame after it's been aggregated |
data frame with a cumulative column (when/where results exist)
This reshapes the relevant columns from a given tab to long, making it tidy and more usable. It relies on either 'reshape_tab()' or 'reshape_psnuim()' depending on the tab being processed.
reshape_dp(df)
reshape_dp(df)
df |
data frame from import_dp() |
Other reshape:
reshape_psnuim()
,
reshape_tab()
This function limits the columns from the PSNUxIM tab and reshapes it long, so that it is more usable. Three values columns are created in the output - datapacktarget, value, share. This function also splits out the PSNU uid from the PSNU column.
reshape_psnuim(df)
reshape_psnuim(df)
df |
data frame from import_dp() |
Other reshape:
reshape_dp()
,
reshape_tab()
This function limits the columns from a target tab (non PSNUxIM) to extract data and reshapes it long, so that it is tidy and more usable. This function also splits out the PSNU uid from the PSNU column.
reshape_tab(df)
reshape_tab(df)
df |
data frame from import_dp() |
Other reshape:
reshape_dp()
,
reshape_psnuim()
Identify which tab to import based on what you want to use - PSNUxIM, SUBNAT, or ALL (non mechanism tabs). You can also provide a specific tab name that matches the Target Setting Tool
return_tab(type)
return_tab(type)
type |
dataset to extract "PSNUxIM", "SUBNAT" (formerly "PLHIV"), "ALL", or a specific tab |
tabs to import
This function removes the contacanated data contained in the same cell. The psnu column in the Target Setting Tool contains both the psnu, psnuuid, and meta data on type - Country/SNU/DREAMS/Military. 'split_psnu' breaks out psnu and psnuuid into two columns and removes any other extraneous information.
split_psnu(df)
split_psnu(df)
df |
Target Setting Tool data frame from tameDP |
Subsets the columns of the massive Target Setting Tool tab down to only those that are needed. This depends on the type of tab that is being imported. The Prioritization tab keeps the PSNU and prioritization column.
subset_prioritization(df)
subset_prioritization(df)
df |
data frame after import |
limits to correct columns in data frame from DP tab
Other subset:
subset_psnuxim()
,
subset_standard()
Subsets the columns of the massive Target Setting Tool tab down to only those that are needed. This depends on the type of tab that is being imported. PSNUxIM keep all meta data and taget share/value columns.
subset_psnuxim(df)
subset_psnuxim(df)
df |
data frame after import |
limits to correct columns in data frame from DP tab
Other subset:
subset_prioritization()
,
subset_standard()
Subsets the columns of the massive Target Setting Tool tab down to only those that are needed. This depends on the type of tab that is being imported. Standard, non-PSNUxIM/Prioritization) keep column types specified in the Target Setting Tool as row_header, target, or past.
subset_standard(df, filepath, tab)
subset_standard(df, filepath, tab)
df |
data frame after import |
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
tab |
sheet being imported |
limits to correct columns in data frame from DP tab
Other subset:
subset_prioritization()
,
subset_psnuxim()
tame_dp is the primary function of the tameDP package, reading in the Data Pack and munging in into a tidy data frame to make it more usable to interact with the data than the way it is stored in the Target Setting Tool. **Given the changes to the Target Setting Tool each year, the function only works for the current COP year: COP24.**
tame_dp(filepath, type = "ALL", map_names = FALSE, psnu_lvl = FALSE)
tame_dp(filepath, type = "ALL", map_names = FALSE, psnu_lvl = FALSE)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
type |
dataset to extract "PSNUxIM", "SUBNAT", or "ALL" [default] or a specific tab |
map_names |
import names from DATIM (OU, mechanism, partner) associated with mech_code |
psnu_lvl |
aggregate to the PSNU level instead of IM |
The main function of 'tameDP' is to bring import a COP Target Setting Tool into R and make it tidy. The function aggregates the COP targets up to the mechanism level, imports the mechanism information from DATIM, and breaks out the data elements to make the dataset more usable.
- Imports Target Setting Tool as tidy data frame - Breaks up data elements stored in the indicatorCode column into distinct columns - Cleans up the HTS variables, separating modalities out of the indicator name - Creates a statushiv column - Cleans and separates PSNU and PSNU UID into distinct columns - Adds in mechanism information from DATIM, including operatingunit, funding agency, partner and mechanism name - Removes any rows with no targets - Allows for aggregate to the PSNU level
Other primary:
tame_plhiv()
,
tame_subnat()
#DP file path path <- "../Downloads/DataPack_Jupiter_20500101.xlsx" #read in Target Setting Tool (straight from sheets, not PSNUxIM tab) df_tst <- tame_dp(path) #read in PLHIV/SUBNAT data df_tst <- tame_dp(path, type = "SUBNAT") #read in PSNUxIM data df_tst <- tame_dp(path, type = "PSNUxIM") #apply mechanism names df_tst_named <- tame_dp(path, type = "PSNUxIM", map_names = TRUE) #aggregate to the PSNU level df_tst_psnu <- tame_dp(path, type = "PSNUxIM", psnu_lvl = TRUE) #reading in multiple files and then applying mechanism names (for PSNUxIM) df_all <- map_dfr(.x = list.files("../Downloads/DataPacks", full.names = TRUE), .f = ~ tame_dp(.x, map_names = FALSE)) df_all <- get_names(df_all)
#DP file path path <- "../Downloads/DataPack_Jupiter_20500101.xlsx" #read in Target Setting Tool (straight from sheets, not PSNUxIM tab) df_tst <- tame_dp(path) #read in PLHIV/SUBNAT data df_tst <- tame_dp(path, type = "SUBNAT") #read in PSNUxIM data df_tst <- tame_dp(path, type = "PSNUxIM") #apply mechanism names df_tst_named <- tame_dp(path, type = "PSNUxIM", map_names = TRUE) #aggregate to the PSNU level df_tst_psnu <- tame_dp(path, type = "PSNUxIM", psnu_lvl = TRUE) #reading in multiple files and then applying mechanism names (for PSNUxIM) df_all <- map_dfr(.x = list.files("../Downloads/DataPacks", full.names = TRUE), .f = ~ tame_dp(.x, map_names = FALSE)) df_all <- get_names(df_all)
Join TST output with MSD output
tame_join(tst_filepath, msd_filepath, fy_as_str = TRUE, map_names = FALSE)
tame_join(tst_filepath, msd_filepath, fy_as_str = TRUE, map_names = FALSE)
tst_filepath |
file path to the Target Setting Tool importing, must be .xlsx |
msd_filepath |
filepath to the latest PSNUxIM MSD for corresponding OU |
fy_as_str |
should FY be converted to a string (2025 > FY25) for Tableau? (default = TRUE) |
map_names |
import names from DATIM (OU, mechanism, partner) associated with mech_code when working with PSNUxIM (default = FALSE) |
dataframe that combines targets from the TST with corresponding historic results/targets from MSD
#TST file path tst_path <- "../Downloads/DataPack_Jupiter_20500101.xlsx" # MSD filepath msd_path <- "../Data/MER_Structured_TRAINING_Datasets_PSNU_IM_FY59-61_20240215_v1_1.zip" #run join function df_join <- tame_join(tst_path, msd_path) #run join function without converting the fiscal year to a string (used in Tableau) df_join <- tame_join(tst_path, msd_path, fy_as_str = FALSE) #run join function with PSNUxIM & map on mechanism info to TST dataframe df_join <- tame_join(tst_path, msd_path, map_names = TRUE)
#TST file path tst_path <- "../Downloads/DataPack_Jupiter_20500101.xlsx" # MSD filepath msd_path <- "../Data/MER_Structured_TRAINING_Datasets_PSNU_IM_FY59-61_20240215_v1_1.zip" #run join function df_join <- tame_join(tst_path, msd_path) #run join function without converting the fiscal year to a string (used in Tableau) df_join <- tame_join(tst_path, msd_path, fy_as_str = FALSE) #run join function with PSNUxIM & map on mechanism info to TST dataframe df_join <- tame_join(tst_path, msd_path, map_names = TRUE)
Deprecated. Use 'tame_subnat' instead.
tame_plhiv(filepath)
tame_plhiv(filepath)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
Other primary:
tame_dp()
,
tame_subnat()
#DP file path path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" #read in Target Setting Tool df_subnat <- tame_subnat(path)
#DP file path path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" #read in Target Setting Tool df_subnat <- tame_subnat(path)
tame_subnat is a sister function to tame_dp, which readings in the SUBNAT and PLHIV data from the Target Setting Tool and munging in into a tidy data frame to make it more usable to interact with the data than the way it is stored in the Target Setting Tool. **Given the changes to the Target Setting Tool each year, the function only works going back to COP21.**
tame_subnat(filepath)
tame_subnat(filepath)
filepath |
file path to the Target Setting Tool importing, must be .xlsx |
Other primary:
tame_dp()
,
tame_plhiv()
#DP file path path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" #read in Target Setting Tool df_subnat <- tame_subnat(path)
#DP file path path <- "../Downloads/DataPack_Jupiter_20200218.xlsx" #read in Target Setting Tool df_subnat <- tame_subnat(path)