---
title: "Data Extraction from Panorama"
author: "Baboyma Kagniniwa"
date: "2021-09-23"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data Extraction from Panorama}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE,
  eval = FALSE,
  fig.retina = 2
)
```

### Introduction

This vignette provides guidance on how to identify and extract data sets output files stored in an S/FTP site.

### Datasets

PEPFAR/Panorama releases, on the quarterly basis, global programs' Monitoring, Evaluation and Reporting (MER), Financial, SIMS, and Narratives.

### Pre-requisites

```{r setup, echo = T, eval = F}
library(tidyverse)
library(glamr)
library(grabr)
```

### Create an active session 

Panorama is a protected site and all user will need to authenticate in order to order to explore the dashboards. Same is true for data extraction. To create an active and valid session for all the http requests, we will use `pano_session()`

```{r echo = T, eval = F}
load_secrets()

user <- pano_user()
pass <- pano_pwd()

```

### Extract content from download page

In order to extract the list of data items on the download page, we will need an active session and the html content of the page. This can be achieved with `pano_items()`. Under the wood, a valid session is created, html elements extracted (`pano_content()`) and parsed out (`pano_elements()`) as data frame.

`pano_items()` combines `pano_content()` and `pano_elements()` into 1 function for a quick access to data items list on a specific page.

```{r echo = T, eval = F}

url <- "https://pepfar-panorama.org/forms/downloads/"

mer_items <- pano_items(page_url = dir_mer_path, 
                        username = user,
                        password = pass) 

mer_items
```


### Download specific items from Panorama

Most data items under the download page of Panorama are listed as zipped files. To download them to a local directory, we will need to use the `pano_download()` function. This function is a wrapper for `httr::GET()` function write option set to a local directory.

```{r echo = T, eval = F}
dest_path <- "../../../Temp/"

url <- mer_items %>% 
  filter(type == "file zip_file",
         str_detect(item, ".*_PSNU_IM_FY19-21_.*.zip$")) %>% 
  pull(path) %>% 
  first() 

url

pano_download(item_url = url, session = sess, dest = dest_path)
```

### Download mutiple items from Panorama

`pano_extract()` is good for batch processing. 

Eg: download all MER data sets from Panorama. This function combine all the above steps into one.

```{r echo = T, eval = F}
items <- pano_extract(item = "mer", 
                      version = "clean", 
                      fiscal_year = 2023, 
                      quarter = 4,
                      username = user,
                      password = pass,
                      unpack = TRUE)

items

url_items <- items %>%
  filter(type == "file zip_file") %>%
  pull(path) %>%
  first() %>% # remove this to downlaod all zipped files
  walk(~pano_download(item_url = .x, 
                      session = sess, 
                      dest = dest_path))
```

### Download specific MSD / OU Specific items from Panorama

`pano_extract_msd()` is designed to facilitate the download of MSD files for specific operating units and at a specific org hierarchy level. Eg: download Zambia's Site x IM data sets from Panorama. 

```{r echo = T, eval = F}
  pano_extract_msd(operatingunit = "Zambia",
                   version = "clean",
                   fiscal_year = 2021,
                   quarter = 3,
                   level = "site",
                   dest_path = NULL)
```

### Download latest MSD / OU Specific items from Panorama

`pano_extract_msds()` is designed to facilitate the download and management of latest MSD files for global and/or specific operating units. The function will also move existing files to an `Archive` folder before downloading current files. Users are also able to include / exclude global datasets with `add_global = TRUE`.

```{r echo = T, eval = F}
  pano_extract_msds(operatingunit = "Zambia",
                    archive = TRUE,
                    dest_path = si_path(),
                    username = pano_user(),
                    password = pano_pwd())
```