Title: | SI Utilities Package |
---|---|
Description: | Provides a series of base functions useful to the GH OHA SI team. This includes project setup, pulling from DATIM, and key functions for working with the MSD. |
Authors: | Aaron Chafetz [aut, cre], Tim Essam [aut], Baboyma Kagniniwa [aut], Benjamin Kasdan [aut] |
Maintainer: | Aaron Chafetz <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.4 |
Built: | 2024-10-30 06:43:06 UTC |
Source: | https://github.com/USAID-OHA-SI/glamr |
‘clean_countries' is used to adjust Natural Earth country names to match PEPFAR’s Operatingunit / country. This function can also be used to shorten OU/Country names by setting the parameter short to TRUE.
clean_countries(.data, colname = "admin", language = "en", short = TRUE)
clean_countries(.data, colname = "admin", language = "en", short = TRUE)
.data |
Reference Datasets |
colname |
Column name to be updated |
language |
language of reference, default is set to 'en'. Options are: 'fr', 'de', 'es', 'ar' |
short |
If TRUE, shorten OU/Country names instead, default is TRUE |
Cleaned DataFrame
## Not run: library(sf) library(rnaturalearth) library(glamr) spdf <- ne_countries(type = "sovereignty", scale = 110, returnclass = "sf") %>% sf::st_drop_geometry() %>% dplyr::select(sovereignt, admin, name, adm0_a3) %>% glamr::clean_countries(colname = "admin") ## End(Not run)
## Not run: library(sf) library(rnaturalearth) library(glamr) spdf <- ne_countries(type = "sovereignty", scale = 110, returnclass = "sf") %>% sf::st_drop_geometry() %>% dplyr::select(sovereignt, admin, name, adm0_a3) %>% glamr::clean_countries(colname = "admin") ## End(Not run)
This function is primarily useful for removing any apostrophe from the filename since this will get rejected by Google Drive, but also includes features like replacing spaces with an underscore, converting to all lowercase, and adding a date prefix or suffix.
clean_filename( x, rm_apostrophe = TRUE, rp_space = FALSE, mk_lower = FALSE, add_date = NULL )
clean_filename( x, rm_apostrophe = TRUE, rp_space = FALSE, mk_lower = FALSE, add_date = NULL )
x |
filepath or file name |
rm_apostrophe |
remove all apostrophes, default = TRUE |
rp_space |
replace spaces with underscore, default = FALSE |
mk_lower |
make lowercase, default = FALSE |
add_date |
add date "prefix" or "suffix" |
clean filename
## Not run: file <- "Submission_Coted'Ivoire_data.csv" new_file <- clean_filename(file, rm_apostrophe = TRUE, add_date = 'prefix') ## End(Not run)
## Not run: file <- "Submission_Coted'Ivoire_data.csv" new_file <- clean_filename(file, rm_apostrophe = TRUE, add_date = 'prefix') ## End(Not run)
Clean and connect parts of text together
connect_text(txt, connections = "[^a-zA-Z0-9]", connector = "_")
connect_text(txt, connections = "[^a-zA-Z0-9]", connector = "_")
txt |
String charactors |
connections |
Characters to be replaced by connector |
connector |
Charactor used as connector |
cleaned text
## Not run: connect_text("THIS - is complex (very bad)") ## End(Not run)
## Not run: connect_text("THIS - is complex (very bad)") ## End(Not run)
Convert Date to FY Quarter/Period
convert_date_to_qtr(date)
convert_date_to_qtr(date)
date |
date formatted like 2021-10-01 |
vector of FY period, eg FY22Q1
Other period:
convert_datim_pd_to_qtr()
,
convert_fy_qtr_to_pd()
,
convert_qtr_to_date()
## Not run: dates <- c("2021-10-01", "2021-11-15") convert_date_to_qtr(dates) ## End(Not run)
## Not run: dates <- c("2021-10-01", "2021-11-15") convert_date_to_qtr(dates) ## End(Not run)
Convert a period a DATIM API in the format of FY22Q1 or FY22 (for targets/cumulative). This function is built into 'extract_datim'.
convert_datim_pd_to_qtr(df, pd_col = "Period")
convert_datim_pd_to_qtr(df, pd_col = "Period")
df |
dataframe from DATIM API, |
pd_col |
name of the period column, default = "Period" |
Convert periods from long CY dates to PEPFAR standard FY
[set_datim()] to store DATIM authentication; [load_secrets()] to store DATIM authentication
Other period:
convert_date_to_qtr()
,
convert_fy_qtr_to_pd()
,
convert_qtr_to_date()
## Not run: df <- tibble::tibble(Periods = c("October - December 2023", "January - March 2024", "October 2023 to September 2024")) df <- convert_datim_pd_to_qtr(df) ## End(Not run)
## Not run: df <- tibble::tibble(Periods = c("October - December 2023", "January - March 2024", "October 2023 to September 2024")) df <- convert_datim_pd_to_qtr(df) ## End(Not run)
Using 'gophr::reshape_msd()' often creates the a perfable long dataset when working with the MSD, but may restrict the user to certain default during the process. Creating a clean period (eg FY22Q1) requires a number of lines of code to get right, so this function provides stopgap when you are working with a long dataset that has a fiscal year and quarter column and desire a period variable.
convert_fy_qtr_to_pd(df, fy_ind = "fiscal_year", qtr_ind = "qtr")
convert_fy_qtr_to_pd(df, fy_ind = "fiscal_year", qtr_ind = "qtr")
df |
MSD data frame reshaped long, eg 'pivot_longer' |
fy_ind |
indicator name in df for the fiscal year, default = "fiscal_year" |
qtr_ind |
indicator name in the df for quarters, default = "qtrs" |
united period column combining and cleaning fiscal year and quarter
Other period:
convert_date_to_qtr()
,
convert_datim_pd_to_qtr()
,
convert_qtr_to_date()
df_summary <- df_msd %>% filter(indicator == "TX_CURR", standardizeddisaggregate == "Total Numerator", operatingunit == "Jupiter") %>% group_by(mech_code, fiscal_year) %>% summarise(across(starts_with("qtr"), sum, na.rm = TRUE), .groups = "drop") df_summary <- df_summary %>% pivot_longer(-c(mech_code, fiscal_year), names_to = "qtrs") df_summary <- convert_fy_qtr_to_pd(df_summary)
df_summary <- df_msd %>% filter(indicator == "TX_CURR", standardizeddisaggregate == "Total Numerator", operatingunit == "Jupiter") %>% group_by(mech_code, fiscal_year) %>% summarise(across(starts_with("qtr"), sum, na.rm = TRUE), .groups = "drop") df_summary <- df_summary %>% pivot_longer(-c(mech_code, fiscal_year), names_to = "qtrs") df_summary <- convert_fy_qtr_to_pd(df_summary)
Convert a period (from reshape_msd()) in the format of FY22Q1 or FY22 (for targets/cumulative).
convert_qtr_to_date(period, type = "start")
convert_qtr_to_date(period, type = "start")
period |
period formated like FY22Q1 or FY22 |
type |
start or end date of quarter/period, default = "start" |
date vector
Other period:
convert_date_to_qtr()
,
convert_datim_pd_to_qtr()
,
convert_fy_qtr_to_pd()
## Not run: df <- read_msd(path) df <- df %>% filter(df, operatingunit == "Jupiter", indicator == "TX_NEW", standarddisaggregate == "Total Numerator") %>% group_by(fiscal_year, primepartner) %>% summarize(across(start_with("qtr"), sum, na.rm = TRUE)) %>% ungroup() df <- df %>% reshape_msd() %>% mutate(date = convert_qtr_to_date(period), .after = period) ## End(Not run)
## Not run: df <- read_msd(path) df <- df %>% filter(df, operatingunit == "Jupiter", indicator == "TX_NEW", standarddisaggregate == "Total Numerator") %>% group_by(fiscal_year, primepartner) %>% summarize(across(start_with("qtr"), sum, na.rm = TRUE)) %>% ungroup() df <- df %>% reshape_msd() %>% mutate(date = convert_qtr_to_date(period), .after = period) ## End(Not run)
Get formatted current date
curr_date(fmt = "%Y-%m-%d")
curr_date(fmt = "%Y-%m-%d")
fmt |
Date format |
## Not run: curr_date() curr_date(fmt = "%m/%d/%Y") ## End(Not run)
## Not run: curr_date() curr_date(fmt = "%m/%d/%Y") ## End(Not run)
To setup/store, run 'glamr::set_datim()'.
datim_pwd()
datim_pwd()
access DATIM password from keyring
Other authentication:
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: load_secrets() ou_table <- datim_outable(datim_user(), datim_pwd()) ## End(Not run)
## Not run: load_secrets() ou_table <- datim_outable(datim_user(), datim_pwd()) ## End(Not run)
To setup/store, run 'glamr::set_datim()'.
datim_user()
datim_user()
access DATIM username from keyring
Other authentication:
datim_pwd()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: load_secrets() ou_table <- datim_outable(datim_user(), datim_pwd()) ## End(Not run)
## Not run: load_secrets() ou_table <- datim_outable(datim_user(), datim_pwd()) ## End(Not run)
'r lifecycle::badge("experimental")'
'export_drivefile() is designed to move files googledrive'
export_drivefile( filename, to_drive, to_folder = NULL, add_folder = TRUE, overwrite = TRUE, ... )
export_drivefile( filename, to_drive, to_folder = NULL, add_folder = TRUE, overwrite = TRUE, ... )
filename |
Character, Full name of the file to be uploaded |
to_drive |
Character, Google drive id |
to_folder |
Character, Google drive sub-folder |
add_folder |
Logical. If TRUE, add sub-folders if they are not present |
overwrite |
Logical. If yes, existing files will be overwritten |
... |
Additional parameters to be passed to 'googledrive::drive_upload()' |
Googledrive file(s) id(s)
## Not run: library(glamr) list.files("./Graphics", "NIGERIA", full.names = TRUE) %>% export_drivefile(filename = ., to_drive = "<path-id>", to_folder = "FY99Q4/VL Suppression", add_folder = TRUE) ## End(Not run)
## Not run: library(glamr) list.files("./Graphics", "NIGERIA", full.names = TRUE) %>% export_drivefile(filename = ., to_drive = "<path-id>", to_folder = "FY99Q4/VL Suppression", add_folder = TRUE) ## End(Not run)
Extract data from excel link
extract_excel_data(src_page, link_id, file_sheet = 2, file_ext = "xlsx")
extract_excel_data(src_page, link_id, file_sheet = 2, file_ext = "xlsx")
src_page |
The http(s)link to the source web page |
link_id |
A CSS identifer of the hyperlinked element |
file_sheet |
The file sheet number or name |
file_ext |
The extension of the file |
File content as a data frame
Extract table data from web page
extract_tbl_data(src_url, tbl_id)
extract_tbl_data(src_url, tbl_id)
src_url |
The http(s)link to the source web page |
tbl_id |
A unique identifer of the target table |
A data frme
Extract text
extract_text(txt, limits = "()")
extract_text(txt, limits = "()")
txt |
text containing parenthesis |
limits |
area to extract text from, c("()", "", "[]") |
text within limits
## Not run: extract_text(txt = "Saint Mary Hopital (SMH)") extract_text(txt = "TDB [Placeholder - New Mechanism]") ## End(Not run)
## Not run: extract_text(txt = "Saint Mary Hopital (SMH)") extract_text(txt = "TDB [Placeholder - New Mechanism]") ## End(Not run)
'folder_setup()' creates an organizational structure that is common across OHA/SI projects so every analyst knows what to expect and where when picking up a new project or one cloned from a co-worker. This function can be used as a stand alone function but primarily serves 'si_setup()'
folder_setup( folder_list = list("Data", "Images", "Scripts", "AI", "Dataout", "Data_public", "GIS", "Documents", "Graphics", "markdown") )
folder_setup( folder_list = list("Data", "Images", "Scripts", "AI", "Dataout", "Data_public", "GIS", "Documents", "Graphics", "markdown") )
folder_list |
list of folders to install |
The standard setup provides the following folders for these uses: * Data - where any raw/input data (**xlsx/csv/rds**) specific to the project are stored * Dataout - where any intermediary or final data (**xlsx/csv/rds**) are output as a product of your code * Data_public - where all public data lives * * Scripts - where all the code (**R/py**) are stored (if there is a local order, make sure to add prefixes to each script, e.g. 00_init.R, 01_data-access.R, 02_data-munging.R, ...) * Images - any **png/jpeg** visual outputs from your code * Graphics - any **svg/pdf** visual outputs that will be edited in vector graphics editor, eg Adobe Illustrator or Inkscape * AI - any **ai** files or other files from a graphics editor (exported pngs products will be stored in Images) * GIS - any **shp** files or other GIS releated inputs * Documents - any **docx/xlsx/pptx/pdf** documents that relate to the process or are final outputs * markdown - exported **md** files from a knitr report
Other project setup:
setup_gitignore()
,
setup_readme()
,
si_setup()
## Not run: #standard folder_setup() #specific fldrs <- c("Data", "Tableau", "AI") folder_setup(fldrs) ## End(Not run)
## Not run: #standard folder_setup() #specific fldrs <- c("Data", "Tableau", "AI") folder_setup(fldrs) ## End(Not run)
Get id of googledrive folder
gdrive_folder(name, path, add = FALSE, ...)
gdrive_folder(name, path, add = FALSE, ...)
name |
Googledrive folder name |
path |
Googledrive parent path id |
add |
Should folder be added if missing, default is true |
... |
Other arguments to passed on to 'drive_mkdir' |
Googledrive folder item it or NULL for non existing folder
This function will create a new folder if add is set to TRUE
## Not run: library(glamr) gdrive_folder("Test-Folder", "ID-adfdfsdfdfdfs") ## End(Not run)
## Not run: library(glamr) gdrive_folder("Test-Folder", "ID-adfdfsdfdfdfs") ## End(Not run)
Google API provides extra metadata stored as a list in the dribble returned, eg modified time, permissions, owner, etc.
gdrive_metadata(df, show_details = FALSE)
gdrive_metadata(df, show_details = FALSE)
df |
Results from Google Drive drive_ls |
show_details |
Show all metadata fields, default is FALSE |
adds extra meta data to data frame
## Not run: library(googledrive) drive_auth() fldr <- as_id("<google-folder-id>") drive_ls(fldr) %>% gdrive_metadata() ## End(Not run)
## Not run: library(googledrive) drive_auth() fldr <- as_id("<google-folder-id>") drive_ls(fldr) %>% gdrive_metadata() ## End(Not run)
This function returns a unique reference id that can be used in scripts and cited in associated plots to help find the associated code on GitHub.
gen_ref_id()
gen_ref_id()
A best practice would be to store the character string output as an object called 'ref_id' in the top matter of your script. If you run 'gophr::get_metadata' after this is stored as an object, it will automatically store this for use in 'metadata$caption'.
8 character string
## Not run: library(glamr) library(gophr) library(ggplot2) library(glue) #create a reference id to include in a plot gen_ref_id() ref_id <- "1e64716c" get_metadata() #plot with ref id ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point() + labs(caption = glue("Source: Edgar Anderson's Iris Data | Ref id: {ref_id}")) Or ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point() + labs(caption = metadata$caption) ## End(Not run)
## Not run: library(glamr) library(gophr) library(ggplot2) library(glue) #create a reference id to include in a plot gen_ref_id() ref_id <- "1e64716c" get_metadata() #plot with ref id ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point() + labs(caption = glue("Source: Edgar Anderson's Iris Data | Ref id: {ref_id}")) Or ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point() + labs(caption = metadata$caption) ## End(Not run)
Get account details
get_account(name)
get_account(name)
name |
Service name of the account |
key / value pair as list containing details of the account (invisible)
Inspired by 'grabr::lazy_secrets()'
Other authentication:
datim_pwd()
,
datim_user()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: get_account(name = 's3') ## End(Not run)
## Not run: get_account(name = 's3') ## End(Not run)
Get value of service key name
get_key(service, name)
get_key(service, name)
service |
Name of the service |
name |
Name of the key |
key value
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: get_key(service = '<service-name>', name = '<key-name>') ## End(Not run)
## Not run: get_key(service = '<service-name>', name = '<key-name>') ## End(Not run)
Get Service Keys
get_keys(service)
get_keys(service)
service |
Account Service name |
list of key names for active services
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: get_keys('<service-name>') ## End(Not run)
## Not run: get_keys('<service-name>') ## End(Not run)
'get_s3key' retrieves your S3 keys using the 'keyring' package. Set name to 'access' for 'Access Key', 'name' to 'secret' for 'Secret Access Key'
get_s3key(name = "access")
get_s3key(name = "access")
name |
S3 account key |
stored key
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: get_s3key(name = "access") ## End(Not run)
## Not run: get_s3key(name = "access") ## End(Not run)
Get Services
get_services()
get_services()
list of active services
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: get_services() ## End(Not run)
## Not run: get_services() ## End(Not run)
'import_drivefile' is a wrapper around 'googledrive::drive_download', useful for pulling multiple files from a given Google Drive folder (with a Google ID provided) to download by default to the Data folder of a project.
import_drivefile(drive_folder, filename, folderpath = "Data", zip = TRUE)
import_drivefile(drive_folder, filename, folderpath = "Data", zip = TRUE)
drive_folder |
Google id for Google Drive Folder |
filename |
exact name of file on Googl Drive to download |
folderpath |
path where you want file stored, default = "Data" |
zip |
should the file be zipped? default = TRUE |
stores file from Google Drive as a zipped file
## Not run: library(googledrive) googledrive::drive_auth() fldr <- "Spp-y8DYsdRTrzDqUmK4fX5v" import_drivefile(fldr, "TestFile.csv") ## End(Not run)
## Not run: library(googledrive) googledrive::drive_auth() fldr <- "Spp-y8DYsdRTrzDqUmK4fX5v" import_drivefile(fldr, "TestFile.csv") ## End(Not run)
Test if service is stored in credential manager
is_stored(service = c("datim", "email", "pano", "s3", "pdap"))
is_stored(service = c("datim", "email", "pano", "s3", "pdap"))
service |
account, either "email", "datim", "pano", "s3", "pdap" |
A boolean
'load_secrets' should be set at the beginning of a script to store your email and DATIM user name under Options for the current session. This allows analysts to more easily share their scripts without having to manually update or remove use names.
load_secrets(service = c("email", "datim", "pano", "s3", "pdap"))
load_secrets(service = c("email", "datim", "pano", "s3", "pdap"))
service |
account, either "email", "datim", "pano", "s3", or "pdap"; by default, all are loaded if they are available |
To initially store your credentials, you will first need to run 'set_email()', 'set_datim()', 'set_pano()', and/or 'set_key' (for s3)
'load_secrets' utilizes 'keyring' package to access the OS credentials store. Storing in a centralized, secure location allows analysts to other analysts code without having to manually change user names/email address to access DATIM or Google Drive.
stores Google, DATIM, PEFPFAR Panorama, s3, and PDAP credentials in session
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: load_secrets() ou_table <- datim_outable(datim_user(), datim_pwd()) ## End(Not run)
## Not run: load_secrets() ou_table <- datim_outable(datim_user(), datim_pwd()) ## End(Not run)
Lookup official Country name
lookup_country(country, language = "en")
lookup_country(country, language = "en")
country |
country name |
language |
language to use for lookup |
cleaned country name
{ cntry <- "Cote d'Ivoire" name <- lookup_country(cntry) name }
{ cntry <- "Cote d'Ivoire" name <- lookup_country(cntry) name }
Open directory explorer or files
open_path(path)
open_path(path)
path |
Full path of the file to be opened |
This assumes default applications are set for various file type
## Not run: dir_name <- "C:/Users/<username>/Downloads" open_path(dir_name) file_name <- "C:/Users/<username>/Downloads/test.csv" open_path(file_name) ## End(Not run)
## Not run: dir_name <- "C:/Users/<username>/Downloads" open_path(dir_name) file_name <- "C:/Users/<username>/Downloads/test.csv" open_path(file_name) ## End(Not run)
To setup/store, run 'glamr::set_pano()'.
pano_pwd()
pano_pwd()
access Panorama password from keyring
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
To setup/store, run 'glamr::set_pano()'.
pano_user()
pano_user()
access Panorama username from keyring
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
When working with PDAP, you will need to access data from either the read or write buckets and need the credentials to do so. This function stores the Access Key associated with your account, 'Sys.getenv("AWS_ACCESS_KEY_ID")'. To use locally, the user will need to store ‘set_key(’pdap', 'access')', which securely stores this information with 'keyring' (we can only write, not read from a local machine).
pdap_access()
pdap_access()
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: library(grabr) s3_upload(upload_file_path, bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) #identify path to dataset uploaded path_wrkbnch <- s3_objects(bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) %>% filter(str_detect(key, "Moz")) %>% pull(key) #read df_msd <- s3read_using(read_psd, bucket = pdap_bucket("write"), object = path_wrkbnch) ## End(Not run)
## Not run: library(grabr) s3_upload(upload_file_path, bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) #identify path to dataset uploaded path_wrkbnch <- s3_objects(bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) %>% filter(str_detect(key, "Moz")) %>% pull(key) #read df_msd <- s3read_using(read_psd, bucket = pdap_bucket("write"), object = path_wrkbnch) ## End(Not run)
When working with PDAP, you will need to access data from either the read or write buckets. The read bucket ("S3_READ") is where PEPFAR Systems stores the MSDs and the write bucket ("S3_WRITE") is where users can upload files (USAID users will have access and write to the "usaid/" sub bucket).
pdap_bucket(type = c("read", "write"))
pdap_bucket(type = c("read", "write"))
type |
is the bucket read (default) or write? |
When access from PDAP Posit Workbench, the function will access the system environment variables 'Sys.getenv("S3_READ")' or 'Sys.getenv("S3_WRITE")' where as it accessing locally, the user will need to store the read bucket location with 'set_key()', which securely stores this information with 'keyring' (we can only write, not read from a local machine).
character string of AWS bucket location
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: library(grabr) s3_upload(upload_file_path, bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) #identify path to dataset uploaded path_wrkbnch <- s3_objects(bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) %>% filter(str_detect(key, "Moz")) %>% pull(key) #read df_msd <- s3read_using(read_psd, bucket = pdap_bucket("write"), object = path_wrkbnch) ## End(Not run)
## Not run: library(grabr) s3_upload(upload_file_path, bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) #identify path to dataset uploaded path_wrkbnch <- s3_objects(bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) %>% filter(str_detect(key, "Moz")) %>% pull(key) #read df_msd <- s3read_using(read_psd, bucket = pdap_bucket("write"), object = path_wrkbnch) ## End(Not run)
When working with PDAP, you will need to access data from either the read or write buckets and need the credentials to do so. This function stores the Secret Access Key associated with your account, 'Sys.getenv("AWS_SECRET_ACCESS_KEY")'. To use locally, the user will need to store ‘set_key(’pdap', 'secret')', which securely stores this information with 'keyring' (we can only write, not read from a local machine).
pdap_secret()
pdap_secret()
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: library(grabr) s3_upload(upload_file_path, bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) #identify path to dataset uploaded path_wrkbnch <- s3_objects(bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) %>% filter(str_detect(key, "Moz")) %>% pull(key) #read df_msd <- s3read_using(read_psd, bucket = pdap_bucket("write"), object = path_wrkbnch) ## End(Not run)
## Not run: library(grabr) s3_upload(upload_file_path, bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) #identify path to dataset uploaded path_wrkbnch <- s3_objects(bucket = pdap_bucket("write"), prefix = "usaid/", access_key = pdap_access(), secret_key = pdap_secret()) %>% filter(str_detect(key, "Moz")) %>% pull(key) #read df_msd <- s3read_using(read_psd, bucket = pdap_bucket("write"), object = path_wrkbnch) ## End(Not run)
A dataset PEPFAR Operating Units and Countries along with their ISO codes. This is a useful dataset for having a full set of PEPFAR countries or to align ISO codes with external datasources. Pulled from DATIM and the FSD.
pepfar_country_list
pepfar_country_list
A data frame with 55 rows and 7 variables:
PEPFAR Operating Unit (countries + 3 regional programs)
PEPFAR Operating Unit ISO-3
PEPFAR Operating Unit unique id from DATIM
PEPFAR Country Name
PEPFAR Country Name ISO-3
PEPFAR Country unique id from DATIM
Is the country prioritized in the 8+1+1 group defined by SGAC?
The list of PEPFAR acceleration countries was defined by Amb. Nkengasong during a DP's retreat for the [Zaidi 2023-06-08 re: Moving countries to green!]. These are countries "where enhanced attention and focus might help 'move the dial' on achieving and sustaining the UNAIDS 95-95-95 targets by 2025"
A dataset PEPFAR Operating Units and Countries along with their ISO codes, alternative names from other sources. This is a useful dataset designed to help with data cleaning / matching from different sources.
pepfar_country_xwalk
pepfar_country_xwalk
A data frame with 55 rows and 29 variables:
ISO-3 Code
Continent name
World Region Name
Alternative name for World Region
US Region Name
ISO Country name in English
SO Country name in French
Regular Expression of Country name in German
Country name in German
Country name in English
Regular Expression of Country name in English
Regular Expression of Country name in French
Country name in French
Regular Expression of Country name in Italian
Country name in Italian
UN Name in Arabic
UN Name in English
UN Name in Spanish
UN Name in French
UN Name in ru
UN Name in zh
rnaturalearth sovereign territory name
rnaturalearth administrative unit name
rnaturalearth country name
PEPFAR Operating Unit (countries + 3 regional programs)
PEPFAR Operating Unit ISO-3
PEPFAR Operating Unit unique id from DATIM
PEPFAR Country Name
PEPFAR Country unique id from DATIM
https://final.datim.org/ https://www.naturalearthdata.com/ https://vincentarelbundock.github.io/countrycode/
A dataset the dates for the release of the MER Structured Dataset
pepfar_data_calendar
pepfar_data_calendar
A data frame with 16 rows and 6 variables:
fiscal year, start = October
fiscal quarter, integer 1-4
data release type, initial or clean
date entry begins into DATIM
date DATIM is closed and data are frozen
date the MSD is released on PEPFAR Panorma
https://datim.zendesk.com/hc/en-us/articles/115001940503-PEPFAR-Data-Calendar
'prinf' is a wrapper around 'print' that returns all rows rather than just the first 10 by default.
prinf(df, ...)
prinf(df, ...)
df |
data frame |
... |
Any other valid option for 'print()'. |
prints out all rows rather than default 10 rows.
## Not run: df_geo %>% prinf() ## End(Not run)
## Not run: df_geo %>% prinf() ## End(Not run)
'return_latest' checks for a pattern in a folder and provides the most recent file bases on the time the file was modified
return_latest(folderpath, pattern, quiet = FALSE, ...)
return_latest(folderpath, pattern, quiet = FALSE, ...)
folderpath |
path to folder where file(s) are located. |
pattern |
pattern in file name, regex expressions. If not parttern is provided, the last file in the folder will be returned. |
quiet |
suppresses the output message related to the file name creation, for use in sub functions, default = FALSE |
... |
Any other valid option for 'base::list.files()'. |
a vector of the full filepath for the most recent version of a file stub
## Not run: file_stub <- "MER_Structured_Datasets_OU_IM_FY18-20" filepath <- return_latest("Data", file_stub) df <- read_rds(filepath) ## End(Not run)
## Not run: file_stub <- "MER_Structured_Datasets_OU_IM_FY18-20" filepath <- return_latest("Data", file_stub) df <- read_rds(filepath) ## End(Not run)
Create / Update account
set_account(name, keys = c("username", "password"), update = FALSE)
set_account(name, keys = c("username", "password"), update = FALSE)
name |
Service name of the account |
keys |
List of account key names |
update |
Should an existing account be overwriten |
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: set_account(name = 's3', keys = c("access", "secret")) ## End(Not run)
## Not run: set_account(name = 's3', keys = c("access", "secret")) ## End(Not run)
'set_datim' stores your DATIM credentials email using the 'keyring' package. This will only need to done once. After running 'set_datim(user)', you will be promoted to enter your password through the RStudio API which will then store the username and password in your OS credential store using 'keyring'.
set_datim(datim_username)
set_datim(datim_username)
datim_username |
DATIM account |
The 'keyring' package utilized the OS credentials store. Storing in a centralized, secure location allows analysts to other analysts code without having to manually change user names/email address to access DATIM or Google Drive.
After 'set_datim' has been run once, an analyst can set 'load_secrets' at the beginning of a script, storing their email and DATIM username under Options for the current session.
stores USAID email in using keyring
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_email()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: set_datim("rshah") ## End(Not run)
## Not run: set_datim("rshah") ## End(Not run)
'set_email' stores your USAID email using the 'keyring' package. This will only need to run once.
set_email(usaid_email)
set_email(usaid_email)
usaid_email |
full USAID email address |
The 'keyring' package utilized the OS credentials store. Storing in a centralized, secure location allows analysts to other analysts code without having to manually change user names/email address to access DATIM or Google Drive.
After 'set_email' has been run once, an analyst can set 'load_secrets' at the beginning of a script, storing their email and DATIM username under Options for the current session.
This function also stores the email locally in your .Rprofile, allowing to be used automatically as the default for 'googledrive::drive_auth()' and 'googlesheets4::gs4_auth()'
stores USAID email using keyring and .Rprofile
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_key()
,
set_pano()
,
set_s3keys()
## Not run: set_email("[email protected]") ## End(Not run)
## Not run: set_email("[email protected]") ## End(Not run)
Set value for service name
set_key(service, name)
set_key(service, name)
service |
Name of the service |
name |
Name of the key |
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_pano()
,
set_s3keys()
## Not run: set_key(service = '<service-name>', name = '<key-name>') ## End(Not run)
## Not run: set_key(service = '<service-name>', name = '<key-name>') ## End(Not run)
'set_pano' stores your PEPFAR Panoram credentials email using the 'keyring' package. This will only need to done once. After running 'set_pano(user)', you will be promoted to enter your password through the RStudio API which will then store the username and password in your OS credential store using 'keyring'.
set_pano(pano_username)
set_pano(pano_username)
pano_username |
Panorama user name (email) |
The 'keyring' package utilized the OS credentials store. Storing in a centralized, secure location allows analysts to other analysts code without having to manually change user names/email address to access DATIM, Panorama, or Google Drive.
After 'set_pano' has been run once, an analyst can set 'load_secrets' at the beginning of a script, storing their PEPFAR Panorama credentials under Options for the current session.
stores Panorama username and password in using keyring
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_s3keys()
## Not run: set_pano("[email protected]") ## End(Not run)
## Not run: set_pano("[email protected]") ## End(Not run)
'set_paths' stores store local folder paths where larger data are stored centrally and outside of projects. Accessed through use of 'si_path()'.
set_paths( folderpath_msd, folderpath_datim, folderpath_raster, folderpath_vector, folderpath_downloads )
set_paths( folderpath_msd, folderpath_datim, folderpath_raster, folderpath_vector, folderpath_downloads )
folderpath_msd |
folderpath where the MSDs are stored |
folderpath_datim |
folderpath where DATIM data are store eg (org hierarchy, mech table) |
folderpath_raster |
folderpath where GIS raster data are stored |
folderpath_vector |
folderpath where GIS vector data are stored |
folderpath_downloads |
folderpath to local Downloads folder |
code chunk to paste into .Rprofile
Other stored paths:
si_path()
## Not run: set_paths <- set_path(folderpath_msd = "C:/Users/rshah/Documents/Data") ## End(Not run)
## Not run: set_paths <- set_path(folderpath_msd = "C:/Users/rshah/Documents/Data") ## End(Not run)
'set_s3keys' stores your s3 keys using the 'keyring' package. This will only need to done once. After running 'set_s3keys(access, secret)', RStudio API which will then store the keys in your OS credential store using 'keyring'.
set_s3keys(access, secret)
set_s3keys(access, secret)
access |
S3 Account Access Key |
secret |
S3 Account Secret Key |
stored access key
Other authentication:
datim_pwd()
,
datim_user()
,
get_account()
,
get_key()
,
get_keys()
,
get_s3key()
,
get_services()
,
load_secrets()
,
pano_pwd()
,
pano_user()
,
pdap_access()
,
pdap_bucket()
,
pdap_secret()
,
set_account()
,
set_datim()
,
set_email()
,
set_key()
,
set_pano()
## Not run: set_s3access("ABDCEDFF", "MLIZD998SD") ## End(Not run)
## Not run: set_s3access("ABDCEDFF", "MLIZD998SD") ## End(Not run)
'setup_gitignore()' creates a .gitignore file (or appends to an existing) one) the standard file types/folders that should not be published to GitHub due to the sensitive nature of PEPFAR data. This function can be used as a stand alone function but primarily serves 'si_setup()'.
setup_gitignore()
setup_gitignore()
adds standard gitignore, plus specific ignores
Other project setup:
folder_setup()
,
setup_readme()
,
si_setup()
## Not run: setup_gitignore() ## End(Not run)
## Not run: setup_gitignore() ## End(Not run)
'setup_readme()' establishes a README.md with the standard USAID disclaimer (or appends to one that currently exists). This function can be used as a stand alone function but primarily serves 'si_setup()'.
setup_readme(add_disclaimer = TRUE)
setup_readme(add_disclaimer = TRUE)
add_disclaimer |
should the standard disclaimer be added, default = TRUE |
adds/appends disclaimer to README
Other project setup:
folder_setup()
,
setup_gitignore()
,
si_setup()
## Not run: #standard (appends dislaimer if README exists) setup_readme() ## End(Not run)
## Not run: #standard (appends dislaimer if README exists) setup_readme() ## End(Not run)
'si_path' accesses folder paths stored in global options to make it easier work across analysts/machines. Analysts will first setup the paths using 'set_paths()' which then store local folder paths where larger data are stored centrally and outside of projects. This will also work on PEPFAR Workbench to return the location of the MSD, 'Sys.getenv("S3_READ")'.
si_path(type = "path_msd")
si_path(type = "path_msd")
type |
folderpath, eg "path_msd" (default), "path_datim", "path_raster", "path_vector", "path_downloads" |
folderpath stored in global options
Other stored paths:
set_paths()
## Not run: #old list.files("C:/Users/rshah/Documents/Data", "OU_IM", full.names = TRUE) #new list.files(si_path("path_msd"), "OU_IM", full.names = TRUE) ## End(Not run)
## Not run: #old list.files("C:/Users/rshah/Documents/Data", "OU_IM", full.names = TRUE) #new list.files(si_path("path_msd"), "OU_IM", full.names = TRUE) ## End(Not run)
'si_setup()' combines three function - 'folder_setup()', 'setup_gitignore()', and 'setup_readme()' to create the base OHA/SI project structure with the necessary folders, disclaimers, and ignored files.
si_setup()
si_setup()
creates folders, readme, and gitignore
Other project setup:
folder_setup()
,
setup_gitignore()
,
setup_readme()
'temp_folder' created a temporary folder in your AppData directory, which will be automatically removed after you close your RStudio session.
temp_folder(launch = FALSE, quiet = FALSE)
temp_folder(launch = FALSE, quiet = FALSE)
launch |
do you want to launch the temp folder in the Windows Explorer? default = FALSE |
quiet |
suppresses the output message related to the folder creation and location, for use in sub functions, default = FALSE |
creates a temp directory and stores it as 'folderpath_tmp'
## Not run: load_secrets() temp_folder(launch = TRUE) purrr::walk2(.x = df_googlefiles$id, .y = df_googlefiles$filename, .f = ~googledrive::drive_download(googledrive::as_id(.x), file.path(folderpath_tmp, .y))) ## End(Not run)
## Not run: load_secrets() temp_folder(launch = TRUE) purrr::walk2(.x = df_googlefiles$id, .y = df_googlefiles$filename, .f = ~googledrive::drive_download(googledrive::as_id(.x), file.path(folderpath_tmp, .y))) ## End(Not run)