Choropleth Maps with Leaflet | Ben Cunningham

Choropleth Maps with Leaflet

Written by Ben Cunningham · on February 4, 2016

Like any loyal Hadleyverse user, I love ggplot2. Paired with ggmap, visualizing spatial data is just as much a breeze as traditional plotting with the package. Wrapping all this with a library like animation adds another neat layer of perspective. But short of developing a Shiny project, there’s not a great way to make things interactive.

Fortunately, JavaScript’s Leaflet library lays the framework for an increasingly popular alternative to this static mapping. By way of RStudio’s leaflet, the library has even branched into the R userspace.

However, while I appreciate Leaflet (and used it just a few weeks ago in this post), it hasn’t always been easy to track down documentation on mastering its many features. So if nothing else, hopefully today’s writeup will serve as a useful reference point regarding one said functionality that eluded me most: wrangling polygons.

Handling the Raw Data

In the aftermath of the Iowa caucuses, I decided I’d try my hand at adding to the endless collection of choropleth maps of my home state. Rather than rehash yet another political poll, however, I grabbed a dataset of school district math proficiencies and a ZIP code reference table, with the end goal of summarizing these observations by county.

library(dplyr)
library(stringr)

zips <- read.csv('data/zips.csv', stringsAsFactors = FALSE)

prof <-
  read.csv('data/proficiencies.csv', stringsAsFactors = FALSE) %>%
  filter(School.Year == 2014, Topic == 'Math', Grade %in% c(3, 6, 10),
         Proficient.Category != 'Not Reportable') %>%
  mutate(
    zip = District.Location %>%
      str_extract('(?<=Iowa )\\d{5,9}(?=\n)') %>%
      substr(1, 5) %>%
      as.numeric()
  ) %>%
  left_join(zips, by = 'zip') %>%
  group_by(county) %>%
  summarize(.,
            prof    = sum(Proficient) / sum(Total),
            prof.3  = sum(Proficient[Grade == 3]) / sum(Total[Grade == 3]),
            prof.6  = sum(Proficient[Grade == 6]) / sum(Total[Grade == 6]),
            prof.10 = sum(Proficient[Grade == 10]) / sum(Total[Grade == 10])
  )

The ZIP code table comes into play merely as an element mapping between school districts and their county of origin. More important is the summarization at the end of the data pipeline; the prof columns represent the math proficiencies in each district for grades 3, 6, and 10 that will be merged into our shapefile data. In the final map, these values will control a county’s shading and popup text.

Wrangling the Geospatial Data

To plot administrative outlines, I grabbed the U.S. Census Bureau’s cartographic boundary shapefiles of counties from its TIGER database. If you haven’t run across spatial data like this before, don’t be mislead by the term shapefile — what we’re really dealing with is a collection of files. Some form the actual geometries while others store metadata, but for our needs, it’s just important to keep in mind that we ought to keep all of them together in a subfolder of our project directory.

In the context of this project, here’s what that directory looks like for me:

proj/
|-- data/
|   |-- counties/
|   |   |-- cb_2014_us_county_500k.cpg
|   |   |-- cb_2014_us_county_500k.dbf
|   |   |-- cb_2014_us_county_500k.shp
|   |   |-- ...
|   |-- proficiencies.csv
|   |-- zips.csv
|-- doc/
|-- analysis.R &

With these files in place, we can use rgdal to read in the geospatial vectors and their associated metadata (which isn’t much, in the case of the U.S. Census Bureau files). After filtering out everything that doesn’t pertain to Iowa (STATEFP == 19), we can merge our prof table into the geospatial metadata.

library(rgdal)

counties <-
  readOGR(dsn = 'data/counties', layer = 'cb_2014_us_county_500k') %>%
  .[.$STATEFP == 19, ]
## OGR data source with driver: ESRI Shapefile 
## Source: "data/counties", layer: "cb_2014_us_county_500k"
## with 3233 features
## It has 9 fields
[email protected] <-
  [email protected] %>%
  mutate(county = NAME) %>%
  left_join(prof, by = 'county')

Tying Together the Map

Now that our data is set, we can introduce the geospatial polygons to Leaflet with relative ease. In fact, after compacting our raw sources into such a tightly-wound object, controlling the map’s aesthetic based on our math proficiency data is quite intuitive.

library(leaflet)

popup <- paste0(
  '<i>', counties$NAME, ' County</i><br><br>',
  '<b>3rd Grade: </b>', round(counties$prof.3 * 100, 2), '%<br>',
  '<b>6th Grade: </b>', round(counties$prof.6 * 100, 2), '%<br>',
  '<b>10th Grade: </b>', round(counties$prof.10 * 100, 2), '%'
)

palette <- colorQuantile('RdBu', NULL, n = 7)

leaflet(data = counties) %>%
  addProviderTiles('Esri.WorldGrayCanvas') %>%
  addPolygons(fillColor = ~ palette(prof),
              fillOpacity = 0.8,
              color = '#BBBBBB',
              weight = 1,
              popup = popup)