---
title: "Combining TST Targets with the MSD"
output: rmarkdown::html_vignette
description: |
  Running tameDP to extract TST targets & bind onto MSD
vignette: >
  %\VignetteIndexEntry{combining-with-msd}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval=F,
  echo=T
)

options(rmarkdown.html_vignette.check_title = FALSE)
```

### Introduction

This vignette provides an overview for utilizing your `tame_dp()` output by combining them with previous targets from the MER Structured Dataset (MSD).

### Setup

If you haven't installed R or `tameDP`, you should first refer to the [Setup vignette](setup.html) or if you want to know more about the options for `tame_dp()`, see [Extracting Targets](extracting-targets.html).

In addition to loading `tameDP`, we're also going to need to install/load a couple others. We'll be working with some of the other packages we installed as dependencies for `tameDP`, but we will need to use two other packages: `here` and another custom OHA package called `gophr`. Let's install bothS now.

```{r installation, eval=F}
install.packages("here")
install.packages('gophr', repos = c('https://usaid-oha-si.r-universe.dev', 'https://cloud.r-project.org'))
```
Alright, with `gophr` installed to work with the MSD, let's get going.

```{r lib, message=FALSE, warning=FALSE}
library(tameDP)
library(gophr)
library(dplyr)
library(tidyr)
library(here)
```

The first step is reading in the from the Target Setting Tool (TST) file. We are going to work with the targets out the target tabs, assuming the PSNUxIM is not yet populated. **Remember, Windows filepaths have their slash in the wrong direction. You will need to replace all backslashes ("/\") in your filepath with forward slashes ("/").**

```{r process}
#data pack file path
tst_filepath <- here("tools/DataPack_Jupiter_20600101.xlsx")
#process the data pack to extract the targets
df_tst <- tame_dp(tst_filepath)

#glimpse the output
glimpse(df_tst)
```

We now have all the targets extracted from the TST and stored in a tidy dataframe called `df_tst`. We want to combine this with the MSD. You will need to download your country's full MSD from [PEPFAR Panorama](https://www.pepfar-panorama.org/) or Genie Extract from [DATIM](datim.org/). For our analysis, let's look across PSNUs so we will want to download the PSNUxIM dataset. Once you have it downloaded, you will want to use `gophr::read_psd()` to open the MSD.

```{r import_msd}
#msd file path
msd_filepath <- here("../../../Downloads/MER_Structured_Datasets_PSNU_IM_FY45-49_Jupiter.txt")

#read in MSD
df_msd <- read_psd(msd_filepath)
```

The PSNUxIM file has a few more columns that we need and the TST output had a processed time column that we won't need.

```{r comp}
#extra column names in the MSD that don't exist in the TST
setdiff(names(df_msd), names(df_tst))

#exta columns in the TST that don't exist in the MSD
setdiff(names(df_tst), names(df_msd))
```

It also has more indicators and disaggregates that aren't in the Data Pack. To simplify matters, we can subset, or limit, the dataset to what we have to work with. Luckily we have a table stored in this packaged called `mer_disagg_mapping` that contains all the indicators and their disaggregates that we can use to filter the MSD by.

```{r disaggs}
mer_disagg_mapping
```

Combining the names from our Data Pack data frame (`names(df_tst)`) to limit the columns and using an inner join to keep only the target disaggregates, we can then append or bind this with onto the MSD data frame. The TST will also have historic fiscal year data in addition to to the planning year that will be duplicated in the MSD so we'll need to remove that as well. 

```{r subset}
#remove extra column from the tidy TST
df_tst <- select(df_tst, -source_processed)

#remove extra columns from MSD
df_msd_lim <- select(df_msd, all_of(names(df_tst))) 

#subset MSD data to only indicators/disaggs that exist in the TST (inner_join is acting like filter)
df_msd_lim <- df_msd_lim %>% 
  inner_join(mer_disagg_mapping, by = c("indicator", "standardizeddisaggregate", "numeratordenom"))

#remove duplicative data
df_tst <- df_tst %>% 
  filter(fiscal_year > max(df_tst$fiscal_year))

#bind TST data frame onto MSD
df_msd_tst <- df_msd_lim %>% 
  bind_rows(df_tst) %>% 
  glimpse()

```

We now have a joined data frame that we can use to compare indicator trends over time. For instance, we can compare how this year's TX_NEW targets compare with previous few years' targets across ages.

```{r count}
df_tx_trend <- df_msd_tst %>% 
  filter(indicator == "TX_NEW",
         standardizeddisaggregate == "Age/Sex/HIVStatus") %>% 
  count(fiscal_year, indicator, ageasentered, wt = targets, name = "targets") 
```

```{r adj_out, echo=FALSE}
set.seed(39)
f <- runif(1)
df_tx_trend <- df_tx_trend %>% 
  mutate(targets = round(targets*f))
```

Finally, we can reshape the data to get it into an easy table to view the trends. 

```{r reshape}
df_tx_trend %>%
  filter(fiscal_year >= 2048) %>% 
  pivot_wider(names_from = fiscal_year,
              values_from = targets)
```