Extract facilities location data

Introduction

This vignette provides guidance to USAID/OHA and other PEPFAR Data Analysts on how to extract location data of PEPFAR supported facilities

Facilities location datasets

PEPFAR uses DATIM for Global HIV/AIDS Programs data management. Supported health facilities location data are also managed through the same data management system. Each facility has a Universal Unique Identifier (UUID), all the parent organizational units UIDs and the latitude / longitude of the site.

Below is one of the many ways to extract PEPFAR facilities location data for specific countries.

Prerequisites

library(tidyverse)     # General data management and viz library
library(glamr)         # OHA/SI utility package
library(gisr)          # OHA/SI geospatial package 
library(sf)            # Spatial data management
library(sp)            # Spatial data management
library(glue)          # String formatting

Workspace

Before jumping into data extraction, we recommend preparing a directory to host data. The best directory for this type of data is under the OHA/Vector path. Follow these steps.

# Define a folder to host data - Using FY
dir_sites <- Sys.Date() %>%
    glamr::convert_date_to_qtr() %>%
    str_sub(1, 4)

dir_data <- glamr::si_path("path_vector") %>%
  paste0("../", .) %>% # This needed only with R Markdown files
  paste0("/OU-Sites/", dir_sites)

# Create the folder
dir.create(dir_data)

dir_geodata <- paste0(dir_data, "/SHP")

dir.create(dir_geodata)

Data Extraction

There are 2 key information needed for facility data extraction: a. OU/Country name, b. Organizational level for Facilities

ou <- "Nigeria"
level_fac <- get_ouorglevel(operatingunit = ou, org_type = "facility")

Now it’s time to proceed with facilities data extraction

# extract location data
df_facs <- extract_locations(country = ou, level = level_fac)

# Get a clean version of the facilities
df_facs <- df_facs %>% extract_facilities()

# Get ride of extract columns
df_facs <- df_facs %>% select(-c(geom_type:nested))

Save the extracted data as .csv file in the pre-defined location

 write_csv(x = df_facs,
           file = paste0(dir_data, "/", ou,
                         " - facilities_locations_extract_",
                         format(Sys.Date(), "%Y-%m-%d"), ".csv"),
           na = "")

Convert Extracted location data into a shapefile

csv files are good for most data analyses but there are time when one need the location data in a shapefile format. Below is how to generate a shapefile from a .csv file.

# By default, location information is stored in these 2 columns
loc_cols <- c("latitude", "longitude")

# create a spatial dataframe - excluding data with no validate lat/long
spdf <- df_facs %>%
  filter(across(all_of(loc_cols), ~ !is.na(.x))) %>%
  mutate(across(all_of(loc_cols), ~ as.numeric(.x)))

# Make sure the Coofinate Reference System is in WGS 84
spdf <- spdf %>% st_as_sf(coords = loc_cols, crs = st_crs(4326))

# Shapefiles columns have a max length
spdf <- spdf %>%
  rename(ou_iso = operatingunit_iso,
         ou = operatingunit,
         cntry_iso = countryname_iso,
         cntry = countryname)

Save the shapefile in the pre-define directory

export_spdf(spdf = spdf,
            name = paste0(dir_geodata, "/", ou,
                          " - facilities_locations_",
                          format(Sys.Date(), "%Y-%m-%d")))

Give it a try and let us know how it went.

Enjoy!