--- title: "Data Extraction from Panorama" author: "Baboyma Kagniniwa" date: "2021-09-23" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Extraction from Panorama} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, eval = FALSE, fig.retina = 2 ) ``` ### Introduction This vignette provides guidance on how to identify and extract data sets output files stored in an S/FTP site. ### Datasets PEPFAR/Panorama releases, on the quarterly basis, global programs' Monitoring, Evaluation and Reporting (MER), Financial, SIMS, and Narratives. ### Pre-requisites ```{r setup, echo = T, eval = F} library(tidyverse) library(glamr) library(grabr) ``` ### Create an active session Panorama is a protected site and all user will need to authenticate in order to order to explore the dashboards. Same is true for data extraction. To create an active and valid session for all the http requests, we will use `pano_session()` ```{r echo = T, eval = F} load_secrets() user <- pano_user() pass <- pano_pwd() ``` ### Extract content from download page In order to extract the list of data items on the download page, we will need an active session and the html content of the page. This can be achieved with `pano_items()`. Under the wood, a valid session is created, html elements extracted (`pano_content()`) and parsed out (`pano_elements()`) as data frame. `pano_items()` combines `pano_content()` and `pano_elements()` into 1 function for a quick access to data items list on a specific page. ```{r echo = T, eval = F} url <- "https://pepfar-panorama.org/forms/downloads/" mer_items <- pano_items(page_url = dir_mer_path, username = user, password = pass) mer_items ``` ### Download specific items from Panorama Most data items under the download page of Panorama are listed as zipped files. To download them to a local directory, we will need to use the `pano_download()` function. This function is a wrapper for `httr::GET()` function write option set to a local directory. ```{r echo = T, eval = F} dest_path <- "../../../Temp/" url <- mer_items %>% filter(type == "file zip_file", str_detect(item, ".*_PSNU_IM_FY19-21_.*.zip$")) %>% pull(path) %>% first() url pano_download(item_url = url, session = sess, dest = dest_path) ``` ### Download mutiple items from Panorama `pano_extract()` is good for batch processing. Eg: download all MER data sets from Panorama. This function combine all the above steps into one. ```{r echo = T, eval = F} items <- pano_extract(item = "mer", version = "clean", fiscal_year = 2023, quarter = 4, username = user, password = pass, unpack = TRUE) items url_items <- items %>% filter(type == "file zip_file") %>% pull(path) %>% first() %>% # remove this to downlaod all zipped files walk(~pano_download(item_url = .x, session = sess, dest = dest_path)) ``` ### Download specific MSD / OU Specific items from Panorama `pano_extract_msd()` is designed to facilitate the download of MSD files for specific operating units and at a specific org hierarchy level. Eg: download Zambia's Site x IM data sets from Panorama. ```{r echo = T, eval = F} pano_extract_msd(operatingunit = "Zambia", version = "clean", fiscal_year = 2021, quarter = 3, level = "site", dest_path = NULL) ``` ### Download latest MSD / OU Specific items from Panorama `pano_extract_msds()` is designed to facilitate the download and management of latest MSD files for global and/or specific operating units. The function will also move existing files to an `Archive` folder before downloading current files. Users are also able to include / exclude global datasets with `add_global = TRUE`. ```{r echo = T, eval = F} pano_extract_msds(operatingunit = "Zambia", archive = TRUE, dest_path = si_path(), username = pano_user(), password = pano_pwd()) ```