Data Extraction from Panorama

Introduction

This vignette provides guidance on how to identify and extract data sets output files stored in an S/FTP site.

Datasets

PEPFAR/Panorama releases, on the quarterly basis, global programs’ Monitoring, Evaluation and Reporting (MER), Financial, SIMS, and Narratives.

Pre-requisites

library(tidyverse)
library(glamr)
library(grabr)

Create an active session

Panorama is a protected site and all user will need to authenticate in order to order to explore the dashboards. Same is true for data extraction. To create an active and valid session for all the http requests, we will use pano_session()

load_secrets()

user <- pano_user()
pass <- pano_pwd()

Extract content from download page

In order to extract the list of data items on the download page, we will need an active session and the html content of the page. This can be achieved with pano_items(). Under the wood, a valid session is created, html elements extracted (pano_content()) and parsed out (pano_elements()) as data frame.

pano_items() combines pano_content() and pano_elements() into 1 function for a quick access to data items list on a specific page.


url <- "https://pepfar-panorama.org/forms/downloads/"

mer_items <- pano_items(page_url = dir_mer_path, 
                        username = user,
                        password = pass) 

mer_items

Download specific items from Panorama

Most data items under the download page of Panorama are listed as zipped files. To download them to a local directory, we will need to use the pano_download() function. This function is a wrapper for httr::GET() function write option set to a local directory.

dest_path <- "../../../Temp/"

url <- mer_items %>% 
  filter(type == "file zip_file",
         str_detect(item, ".*_PSNU_IM_FY19-21_.*.zip$")) %>% 
  pull(path) %>% 
  first() 

url

pano_download(item_url = url, session = sess, dest = dest_path)

Download mutiple items from Panorama

pano_extract() is good for batch processing.

Eg: download all MER data sets from Panorama. This function combine all the above steps into one.

items <- pano_extract(item = "mer", 
                      version = "clean", 
                      fiscal_year = 2023, 
                      quarter = 4,
                      username = user,
                      password = pass,
                      unpack = TRUE)

items

url_items <- items %>%
  filter(type == "file zip_file") %>%
  pull(path) %>%
  first() %>% # remove this to downlaod all zipped files
  walk(~pano_download(item_url = .x, 
                      session = sess, 
                      dest = dest_path))

Download specific MSD / OU Specific items from Panorama

pano_extract_msd() is designed to facilitate the download of MSD files for specific operating units and at a specific org hierarchy level. Eg: download Zambia’s Site x IM data sets from Panorama.

  pano_extract_msd(operatingunit = "Zambia",
                   version = "clean",
                   fiscal_year = 2021,
                   quarter = 3,
                   level = "site",
                   dest_path = NULL)

Download latest MSD / OU Specific items from Panorama

pano_extract_msds() is designed to facilitate the download and management of latest MSD files for global and/or specific operating units. The function will also move existing files to an Archive folder before downloading current files. Users are also able to include / exclude global datasets with add_global = TRUE.

  pano_extract_msds(operatingunit = "Zambia",
                    archive = TRUE,
                    dest_path = si_path(),
                    username = pano_user(),
                    password = pano_pwd())