Overview

An example of using R to investigate the clustering of food stamp usage, here for the Detroit metro area.

This was originally done for a quantitative spatial data analysis lab, hence the tedium of having to wade through my code. On the other hand, you ought to be able to replicate my results.

Setup

Loading libraries:

options(scipen = 999) # turn off scientific notation
library(pacman) # package-loading manager
p_load(sf,
       spdep,
       tmap,
       tidycensus,
       ggplot2,
       dplyr)

I’ll use tidycensus to read the ACS data for Oakland, Macomb, and Wayne counties in Michigan to look at the Detroit MSA:

detroit <- get_acs( 
  geography = 'tract', # designate desired geography
  year = 2019, # and year
  state = 'MI', # then state
  county = c('Macomb','Wayne','Oakland'), # and counties
  variables = c("households_" = 'B22010_001E', "foodstamps_" = 'B22010_002E'), # followed by the column(s) we're looking for
  geometry = TRUE, # we want some coordinates
  output = 'wide'
) %>% 
  st_transform(3857) %>% # Project to something with better units (meters, in this case)
  filter(!sf::st_is_empty(.)) # Make sure we only have rows with geometries

Check out our object:

head(detroit)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -9276830 ymin: 5212783 xmax: -9245961 ymax: 5263564
## Projected CRS: WGS 84 / Pseudo-Mercator
##         GEOID                                        NAME households_
## 1 26125193000 Census Tract 1930, Oakland County, Michigan        1247
## 2 26163555600   Census Tract 5556, Wayne County, Michigan        1107
## 3 26163520300   Census Tract 5203, Wayne County, Michigan        1234
## 4 26125161500 Census Tract 1615, Oakland County, Michigan         702
## 5 26163557100   Census Tract 5571, Wayne County, Michigan         959
## 6 26163543600   Census Tract 5436, Wayne County, Michigan         446
##   B22010_001M foodstamps_ B22010_002M                       geometry
## 1          24          23          21 MULTIPOLYGON (((-9261118 52...
## 2          40          80          45 MULTIPOLYGON (((-9274380 52...
## 3         137         167          73 MULTIPOLYGON (((-9246816 52...
## 4          36          39          33 MULTIPOLYGON (((-9266371 52...
## 5          57          28          26 MULTIPOLYGON (((-9276830 52...
## 6          52         234          58 MULTIPOLYGON (((-9268657 52...

Replace NAs with 0s:

# Replace NA values with 0
detroit$households_ <- ifelse(is.na(detroit$households_), 0 ,detroit$households_) 

Now we’re ready to roll, but first let’s take a look at the data by drawing a map showing percentage of households using foodstamps in the Detroit area:

detroit <- mutate(detroit, foods_pct = round((foodstamps_ / households_ * 100), 2)) # creates a new attribute with percent households using food assistance
detroit$foods_pct <- ifelse(is.na(detroit$foods_pct), 0, detroit$foods_pct) # take care of NA again

tmap_mode('view') # set mode to web map
# orig_basemaps <- c("Esri.WorldGrayCanvas","OpenStreetMap","Esri.WorldTopoMap")

# draw the map
tmap_options(basemaps = "Esri.WorldGrayCanvas")
tm_shape(detroit) + # use ACS census tract data
  tm_polygons(
    id = "NAME", # hover-over shows the tract ID and county
    title = "% Households using foodstamps",
    col = c("Foodstamp usage:"= "foods_pct"), # Use color to symbolize foodstamp usage
    palette = "YlGnBu", # Set color ramp
    alpha = 0.7, # Transparency
    border.col = "white", # Line/border color
    lwd = 0.6, # Line/border width
    style='jenks', # Use the Jenks algorithm to define natural breaks between classes
    contrast = c(0,1),
    popup.vars = c( # some info for the popup window if you click on a tract
      "Total Households" = "households_",
      "Households Using Foodstamps" = "foodstamps_", 
      "% Using Foodstamps" = "foods_pct")) +
  tm_tiles("Stamen.TerrainLabels", zindex = 500)
 ██▓███   ▄▄▄       ██▀███  ▄▄▄█████▓    ▒█████   ███▄    █ ▓█████ 
▓██░  ██▒▒████▄    ▓██ ▒ ██▒▓  ██▒ ▓▒   ▒██▒  ██▒ ██ ▀█   █ ▓█   ▀ 
▓██░ ██▓▒▒██  ▀█▄  ▓██ ░▄█ ▒▒ ▓██░ ▒░   ▒██░  ██▒▓██  ▀█ ██▒▒███   
▒██▄█▓▒ ▒░██▄▄▄▄██ ▒██▀▀█▄  ░ ▓██▓ ░    ▒██   ██░▓██▒  ▐▌██▒▒▓█  ▄ 
▒██▒ ░  ░ ▓█   ▓██▒░██▓ ▒██▒  ▒██▒ ░    ░ ████▓▒░▒██░   ▓██░░▒████▒
▒▓▒░ ░  ░ ▒▒   ▓▒█░░ ▒▓ ░▒▓░  ▒ ░░      ░ ▒░▒░▒░ ░ ▒░   ▒ ▒ ░░ ▒░ ░
░▒ ░       ▒   ▒▒ ░  ░▒ ░ ▒░    ░         ░ ▒ ▒░ ░ ░░   ░ ▒░ ░ ░  ░
░░         ░   ▒     ░░   ░   ░         ░ ░ ░ ▒     ░   ░ ░    ░   
               ░  ░   ░                     ░ ░           ░    ░  ░

Local Getis-Ord G

Part one will compute Local Getis-Ord G to assess spatial autocorrelation in the data.

The first step is to create a ‘neighborhood’ from the detroit object we got reading in ACS data via tidycensus:

detroit_nb <- poly2nb(pl = detroit, queen = TRUE) %>% # Using Queen Contiguity
  include.self() # include the local polygon for Gi* 
detroit_nb 
## Neighbour list object:
## Number of regions: 1165 
## Number of nonzero links: 8851 
## Percentage nonzero weights: 0.6521395 
## Average number of links: 7.597425

Looks like there are no ‘island’ tracts, great! We’ll use this neighborhood definition to create a spatial weights matrix:

detroit_W <- nb2listw(
  neighbours = detroit_nb, 
  style = "W", # Leave this as a default
  zero.policy = TRUE) # This will allow for our NA values
print(detroit_W,zero.policy = T) 
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 1165 
## Number of nonzero links: 8851 
## Percentage nonzero weights: 0.6521395 
## Average number of links: 7.597425 
## 
## Weights style: W 
## Weights constants summary:
##      n      nn   S0       S1       S2
## W 1165 1357225 1165 316.8902 4709.393

Just for kicks, let’s also generate a weights matrix using distance to define neighbors:

detroit_dnn <- dnearneigh(x = st_centroid(detroit), d1 = 0, d2 = 5000) %>% # use 5km
  include.self() # again we want to generate Gi*
detroit_dnn_W <- nb2listw(neighbours = detroit_dnn, style = "W", zero.policy = T)
print(detroit_dnn_W, zero.policy = TRUE)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 1165 
## Number of nonzero links: 23127 
## Percentage nonzero weights: 1.703992 
## Average number of links: 19.8515 
## 
## Weights style: W 
## Weights constants summary:
##      n      nn   S0       S1       S2
## W 1165 1357225 1165 243.0442 4676.924

Now to compute spatial lag for use of food stamps:

detroit$lag_foods_pct <- lag.listw(x = detroit_W, var = detroit$foods_pct, zero.policy = TRUE)
detroit$lag_dist_foods_pct <- lag.listw(x = detroit_dnn_W, var = detroit$foods_pct, zero.policy = TRUE)

Visually comparing the two lags based on different neighborhood definition methods:

tmap_mode('plot')
## tmap mode set to plotting
tm_shape(detroit) + 
  tm_polygons(
    col = c('lag_foods_pct', 'lag_dist_foods_pct'),
    title = c('% with lag\n(contiguity)','% with lag\n(5km dist)'),
    palette = "YlGnBu",
    border.col = 'white',
    lwd = 0.5,
    contrast = c(0.2,1)
  ) + 
  tm_layout(
    main.title = "Detroit Area Food Stamp Usage by Census Tract",
    legend.position = c("right","bottom")
  )

Cool! Now to assess local spatial autocorrelation. We’ll use the lag generated via Queen’s contiguity:

detroit_local_g <- localG( # run the Local Getis-Ord G tool
  x = detroit$lag_foods_pct,
  listw = detroit_W, # contiguity-based weights matrix
  zero.policy = TRUE) %>% 
  as.vector()

# cool function Jackson built to bin localG values for easy mapping
make_g_bin <- function(z){
  case_when(
    z > -1.96 & z < 1.96 ~ "Insignificant",
    z >= 1.96 & z < 2.58 ~ "Hotspot (>1.96)",
    z >= 2.58 ~ "Hotspot (>2.58)",
    z < -1.96 & z > -2.58 ~ 'Coldspot (<-1.96)',
    z < -2.58 ~ 'Coldspot (<-2.58)'
    # TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      #"Unknown",
      "Coldspot (<-2.58)",
      "Coldspot (<-1.96)",
      "Insignificant",
      "Hotspot (>1.96)",
      "Hotspot (>2.58)"))
}

# set the palette, including the sublime 'cornsilk'
g_palette <- c('navy','cornflowerblue','cornsilk','brown1','firebrick')

# the deliverable is supposed to be a web map
tmap_mode('view')
## tmap mode set to interactive viewing
# draw our map "on the fly" using Jackson's cool function
detroit %>% mutate(G = make_g_bin(detroit_local_g)) %>% # adds a new attribute for binned localG values
  tm_shape() +
  tm_polygons(
    id = "NAME", # display on hover
    col = "G",
    palette = g_palette,
    border.col = 'grey80',
    lwd=0.4,
    title = "Local Getis-Ord G",
    popup.vars = c( # popup info
      "Total Households" = "households_",
      "Households Using Foodstamps" = "foodstamps_", 
      "% Using Foodstamps" = "foods_pct")) +
  tm_scale_bar() +
  tm_tiles("Stamen.TerrainLabels", zindex = 500) +
  tm_layout(main.title = "Detroit Area Food Stamp Usage")

Discussion

The clustering patterns evident in the map above reflect a host of spatio-demographic factors. Because eligibility for food stamp assistance is tied to household income, it’s reasonable to conclude that Detroit’s “inner city” is far poorer than other parts of the tri-county area, with the curious exception of the city of Pontiac, the “island” of hotspots to the northwest. Household median income, household size, average rent, and so forth could be helpful data to help shore up this conclusion. Meanwhile, there are coldspots surrounding the inner city at what looks to be a consistent distance, which probably reflect the legacy of postwar infrastructure and housing development. If I remember correctly Detroit was a poster child for what is now referred to as “white flight” in the 1950s and, along with (though moreso than) cities like St Louis, Cleveland, Buffalo, and Philadelphia has suffered from globalization-driven deindustrialization. Detroit’s inner city is a wonder of urban decay, with skyscrapers literally crumbling to the ground. In the time I’ve spent there (it’s been a few years now) the only real gentrification I noticed was surrounding the Tiger’s new stadium, the so-called Entertainment district, which indeed is part of the little pocket of statistically insignificant census tracts close to the river adjacent to Windsor, Ontaria.

 ██▓███   ▄▄▄       ██▀███  ▄▄▄█████▓   ▄▄▄█████▓ █     █░ ▒█████  
▓██░  ██▒▒████▄    ▓██ ▒ ██▒▓  ██▒ ▓▒   ▓  ██▒ ▓▒▓█░ █ ░█░▒██▒  ██▒
▓██░ ██▓▒▒██  ▀█▄  ▓██ ░▄█ ▒▒ ▓██░ ▒░   ▒ ▓██░ ▒░▒█░ █ ░█ ▒██░  ██▒
▒██▄█▓▒ ▒░██▄▄▄▄██ ▒██▀▀█▄  ░ ▓██▓ ░    ░ ▓██▓ ░ ░█░ █ ░█ ▒██   ██░
▒██▒ ░  ░ ▓█   ▓██▒░██▓ ▒██▒  ▒██▒ ░      ▒██▒ ░ ░░██▒██▓ ░ ████▓▒░
▒▓▒░ ░  ░ ▒▒   ▓▒█░░ ▒▓ ░▒▓░  ▒ ░░        ▒ ░░   ░ ▓░▒ ▒  ░ ▒░▒░▒░ 
░▒ ░       ▒   ▒▒ ░  ░▒ ░ ▒░    ░           ░      ▒ ░ ░    ░ ▒ ▒░ 
░░         ░   ▒     ░░   ░   ░           ░        ░   ░  ░ ░ ░ ▒  
               ░  ░   ░                              ░        ░ ░   

Local Moran’s I

For part two, we’ll use the same census data to run Local Moran’s I

We have to recreate the neighborhood and weights matrix because we don’t want include.self() for this operation:

# Recreate--can't have "include.self()" here
detroit_nb <- poly2nb(detroit, queen=T)
detroit_W <- nb2listw(detroit_nb, 
                    style="W",
                    zero.policy = FALSE)

# compute Local Moran's I and return it as a table
detroit_local_i <- localmoran(
  x = detroit$foods_pct, 
  listw = detroit_W) %>% 
  as_tibble()

# print the table
detroit_local_i
## # A tibble: 1,165 × 5
##    Ii         E.Ii           Var.Ii     Z.Ii       `Pr(z != E(Ii))`
##    <localmrn> <localmrn>     <localmrn> <localmrn> <localmrn>      
##  1  0.7924660 -0.00076438638 0.14766731  2.0642257 0.0389963261602 
##  2  0.1920420 -0.00034235056 0.04953776  0.8643738 0.3873826060426 
##  3 -0.3112308 -0.00006105806 0.01417673 -2.6134208 0.0089640849551 
##  4  0.4004440 -0.00045522853 0.10565532  1.2333590 0.2174418651969 
##  5  0.5958790 -0.00066642357 0.09639952  1.9213481 0.0546878362575 
##  6  3.4055515 -0.00339337689 0.65382178  4.2158999 0.0000248784151 
##  7  3.1004821 -0.00227849749 0.37639116  5.0574147 0.0000004249785 
##  8  0.1326258 -0.00031890814 0.04614675  0.6188714 0.5360010561304 
##  9  0.2789279 -0.00017673235 0.02925640  1.6317616 0.1027297274668 
## 10  0.2669281 -0.00037315030 0.06175942  1.0755963 0.2821078532675 
## # ℹ 1,155 more rows

We need to extract those LISA values from detroit_local_i before mapping. We’ll shunt them into our detroit object:

# adjust p-values to be more conservative using false discovery rate (FDR)
detroit$lisa_I <- detroit_local_i$Ii
detroit$lisa_EI <- detroit_local_i$E.Ii
detroit$lisa_Z <- detroit_local_i$Z.Ii
detroit$lisa_p <- p.adjust(detroit_local_i$`Pr(z != E(Ii))`,
                         method = "fdr")

Now make a scatterplot:

detroit$lag_foods_pct <- lag.listw(x = detroit_W, var = detroit$foods_pct) # make a fresh lag
detroit$lag_foods_pct <- ifelse(is.na(detroit$lag_foods_pct), 0, detroit$lag_foods_pct) # replace NA if necessary with 0

# our old friend, the moran_plot function
moran_plot = function(var, w_var, xlab, ylab, main_title = NULL, normalize = T){
  if(missing(main_title)){
    main_title = "Moran's I Plot"
  }
  if(normalize){
    var = scale(var) %>% as.vector()
    w_var = scale(w_var) %>% as.vector()
  }
  
  # now with color!
  plot_dat = data.frame(var,w_var,stringsAsFactors = F) %>% 
    mutate(quad = case_when( # new column for binning like so:
        var > 0 & w_var > 0 ~ "HH",
        var > 0 & w_var < 0 ~ "HL",
        var < 0 & w_var < 0 ~ "LL",
        var < 0 & w_var > 0 ~ "LH",
        TRUE ~ 'Unknown'))
  plot_dat %>% ggplot() +
    geom_vline(xintercept = 0) +
    geom_hline(yintercept = 0) +
    geom_point(aes(x=var,y=w_var,color=quad)) +
    xlab(xlab) +
    ylab(ylab) +
    geom_smooth(
      mapping = aes(x=var,y=w_var),
      formula = 'y ~ x',
      method = 'lm',
      color="black",
      size=0.1,
      se=F) +
    scale_color_manual(
      name = "", 
      values = c('Unknown' = 'grey','HH'="firebrick",'HL'="brown1",'LL'="navy",'LH'="cornflowerblue")) +
    ggtitle(main_title)
}

# draw the scatter plot
moran_plot(
  var = detroit$foods_pct, 
  w_var = detroit$lag_foods_pct, 
  xlab = "Households using food stamps (normalized %)", 
  ylab = "Lagged Households using food stamps (normalized %)",
  main_title = ("Local Moran's I plot for food stamp usage in the Detroit area"))

Very cool! but I wanted my scatter plot to also communicate statistical significance, if possible:

v <- detroit$lisa_p # we'll use this to assess significance
x <- scale(detroit$foods_pct) %>% as.vector() # for binning significant HH, LH, and LL tracts
y <- scale(detroit$lag_foods_pct) %>% as.vector() # same

detroit <- detroit %>% mutate(new = # this is stolen from the cool function above, but here I add a bin for insignificant tracts
  case_when(
    v > 0.05 ~ "Insignificant",
    x > 0 & y > 0 ~ "HH",
    # x > 0 & y < 0 ~ "HL", # there were no HL tracts, excised for greater legend control
    x < 0 & y > 0 ~ 'LH',
    x < 0 & y < 0 ~ 'LL'
    # TRUE ~ 'Unknown' # there were no Unknown tracts
  ) %>% as.factor() %>% 
    ordered(c(
      "Insignificant",
      "HH",
      # "HL", # see above
      "LH",
      "LL")))
      # "Unknown"))) # see above

lm_palette <- c('cornsilk','firebrick','cornflowerblue','navy') # just 4 colors then

ggplot(detroit) +
  geom_vline(xintercept = 0) +
  geom_hline(yintercept = 0) +
  xlab("Households using food stamps (normalized %)") + 
  ylab("Lagged Households using food stamps (normalized %)") +
  ggtitle("Local Moran's I plot for food stamp usage in the Detroit area") +
  geom_point(aes(x=x,y=y,color=new,size=lisa_p,alpha = 0.33)) + # use size to represent p-values and add some alpha because there's a lot of overlap
  scale_color_manual(
    name = "", 
    values = lm_palette) +
  scale_radius(range = c(3,0.1)) +
  theme_dark()

This is interesting! P-values get lower moving away from zero on the y-axis. Now let’s see how it looks as a choropleth:

# draw a web map
tm_shape(detroit) +
  tm_polygons(
    id = "NAME", # hover-over label
    col = "new",
    palette = lm_palette,
    border.col = 'grey60',
    lwd=0.4,
    popup.vars = c( # popup info
      "Total Households" = "households_",
      "Households Using Foodstamps" = "foodstamps_", 
      "% Using Foodstamps" = "foods_pct"),
    title = "Local Moran's I") +
  tm_tiles("Stamen.TerrainLabels", zindex = 500) +
  tm_scale_bar() + 
  tm_layout(title = "Detroit Area Food Stamp Usage")

Discussion

As with the Local Getis-Ord G map previously, here we see extreme clustering of high food stamp usage in Detroit’s inner city, plus the little island of HH values in Pontiac, which no doubt has to do with Pontiac’s very high percentage of POC (over 75%) and decline as a (minor) manufacturing center in the region. The big difference between this map and the one generated from the Getis-Ord Gi* is the rarity of significant LL clustering, which occur to the northwest of the study area but in relative isolation. In addition, there are no significant HL tracts according to Local Moran’s I, though there are 10 LH tracts, all adjacent to the big HH inner city cluster. Again, demographic and income data would be helpful to make better sense of the patterns here. Statistics on crime, development, access to social services, age of building stock, tax delinquency and so forth might also help to paint a clearer picture of what’s going on with the clustering of food stamp usage. In my experience Detroit–along with smaller places like Birmingham, Alabama or Gary, Indiana–is unusual among American cities in that its core hasn’t been redeveloped much over the last several decades despite depressed real estate values. Low household income, along with a large amount of decrepit housing and concomitant low rents, are likely the most relevant influences on the patterns seen in the map.