--- title: "Project Workflow" output: rmarkdown::html_vignette description: | Basics of starting a project vignette: > %\VignetteIndexEntry{Project Workflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ### Introduction This vignette provides best practices for project workflow at USAID/Office of HIV/AIDS, making it easier to get you going and to align exceptions across the team. ### Initiate Project The first thing we want to do when starting up new work is determining if it fits into an existing bucket or this is completely new. All of our R work exists on GitHub to make our work transparent, improve collaboration, and makee it easy access. You find our repositories for different projects and package under our [GitHub organization, USAID-OHA-SI](https://github.com/USAID-OHA-SI/). If the work exists, you can clone the repo (make a local copy) and start working from there. If however its new work, you'll want to start up a new project and repo. We work with [RStudio Projects](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) and git/GitHub > RStudio projects make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents. To create a RStudio project, you can navigate there from `File > New Project` or establish through `usethis` ```{r proj_setup, eval=F} library(usethis) create_project("~/Documents/project-transform") ``` This will get you set you up in RStudio with a self contained project. In order to collaborate we also use git and GitHub. For more on integrating git with RStudio, check out Jenny Bryan's [Happy Git and GitHub for the useR](https://happygitwithr.com/). You an also use a UI tool like GitHub Desktop or GitKraken. If you have git installed and integreated with R, you can again use `usethis` to set everything up. And `usethis` even has some [help info on getting started with git](https://usethis.r-lib.org/articles/articles/usethis-setup.html#configure-user-name-and-user-email) if this your first time. ```{r connect_git, eval=F} use_git() #this prompts a restart of your session use_github(organisation = "USAID-OHA-SI") ``` The last thing we'll use `usethis` for is to set up a license for the project. We use the MIT license which calls for attribution for downstream uses of your work and specifies that the software is provided as is. ```{r license, eval=F} usethis::use_mit_license("Dr. Raj Shah") #add your name ``` ### SI Workflow, Enter glamr Okay, so we go through the basics of setting up a project, now let's turn to some workflow specifics that `glamr` is going to help us with. ```{r setup} library(glamr) ``` The primary function we're going to run to get us going is `si_setup`. If we look under the hood at `si_setup`, we can see it actually contains three sub-function. ```{r si_setup} si_setup ``` The first one, `folder_setup()` initiates the main folders we use for our work: - Data - where any raw/input data (**xlsx/csv/rds**) specific to the project are stored - Dataout - where any intermediary or final data (**xlsx/csv/rds**) are output as a product of your code - Data_public - where any **public data** should be stored. This can be posted to github. - Scripts - where all the code (**R/py**) are stored (if there is a local order, make sure to add prefixes to each script, e.g. 00_init.R, 01_data-access.R, 02_data-munging.R, ...) - Images - any **png/jpeg** visual outputs from your code - Graphics - any **svg/pdf** visual outputs that will be edited in vector graphics editor, eg Adobe Illustrator or Inkscape - AI - any **ai** files or other files from a graphics editor (exported pngs products will be stored in Images) - GIS - any **shp** files or other GIS releated inputs - Documents - any **docx/xlsx/pptx/pdf** documents that relate to the process or are final outputs - markdown - exported **md** files from a knitr report If you clone from GitHub not all the folders may exist there, so we recommend running `folder_setup()` on its own to establish the same organization on your local machine. The reason we have created this into a function is to ensure uniformity across our projects and know what to expect when we pick up someone else's work (or our own work from the distant past). The next function that is run is `setup_gitignore()`, which creates a `.gitignore` file in your project. The primary purpose of this file is to ensure that our data and outputs are not being published to the web. By default after running `si_setup()` most data formats/folders will be kept from being published. Below is what the `.gitignore` file will look like. ```{r gitignore_components,eval=F} #R basics .Rproj.user .Rhistory .RData .Ruserdata #no data *.csv *.gz *.txt *.rds *.xlsx *.xls *.zip *.png *.pptx *.tfl *.twb *.twbx *.tbs *.tbm *.hyper *.sql *.parquet *.svg *.dfx *.json *.dta *.shp *.dbf #nothing from these folders AI/* GIS/* Images/* Graphics/* Data/* Dataout/* ``` And the last component from `si_setup()` is to add a standard USAID disclaimer to your project's `README.md` file (and creates this file if its missing). From a project standpoint, now you're good to go. ### Accessing Data For the most part, our SI work revolves around using the MER Structured Datasets, which are large, cumbersome files. A best practice when working on projects is to store the data you use within that project so its all self-contained. However, since most of our work revolve around a couple of massive datasets (OUxIM, PSNU, PSNUxIM, NAT_SUBNAT, FSD), it makes more sense for us to store these dataset in a central location on our machines rather than in each project. The problem now is that where I store my MSD files is going to be different than the path to yours. To solve this dilemma, we use a function called `si_paths()` which access the paths we have stored locally to where our MSDs or Downloads are for instance. This way, when you pick up a coworkers code, you don't have to change any of the file paths, it just works. Those local paths will be set once and stored in your `.Rprofile`. To do so, you will run `set_paths` to store all the relevant paths (you can ignore any that aren't relevant to you). ```{r setup_paths, eval = F} set_paths(folderpath_msd = "~/Documents/Data", folderpath_datim = "~/Documents/DATIM", folderpath_downloads = "~/Downloads") ``` Running this will open your `.Rprofile` and you'll be prompted to paste the pre-copied code into the `.Rprofile` save and restart your session. With that stored, you can use `si_path()` to return the path to your MSD folder (as the default) and then use another `glamr` function, `return_latest()`, which looks in the provide folder path for the lastest version of a file that matches the pattern you provided. In this case, we can pass in that we want the last OUxIM MSD and it will return the file path which we can pass into `readr::read_rds()` ```{r load_msd, eval = F} df <- si_path() %>% return_latest("OU_IM_FY19") %>% readr::read_rds() ```