Download notebook (.ipynb)

Mapping US household income.#

In this demo we will explore, how Lets-Plot Geocoding API and spatial plots are used to create choropleth maps without a use of any additional shapefiles or GeoJSON files.

Data#

The dataset contains household mean income as well as name of City, County and State.

Note that not all US counties are present. This dataset is incomplete but this doesn’t really matter for the purpose of this demo.

Tasks completed in this notebook:#

  • Load and clean the data.

  • Use state geocoder to fetch US states boundaries.

  • Use 1-key form of the map_join parameter to create a choropleth map of US states.

  • Use county geocoder to fetch US counties boundaries.

  • Use more 2-key form of the map_join parameter to create a choropleth map of US counties.

  • Add the geom_livemap layer to display the choropleth on an interactive map.

To learn more about geocoding in Lets-Plot see: Geocoding API.

import pandas as pd

from lets_plot import *
from lets_plot.geo_data import *
The geodata is provided by © OpenStreetMap contributors and is made available here under the Open Database License (ODbL).
LetsPlot.setup_html()
import lets_plot
lets_plot.__version__
'4.8.2'

Data#

In order to keep plots simple we will remove the states of Alaska, Hawaii and Puerto Rico from our dataset.

These 3 states are located far away from the rest 48 contiguous states (aka CONUS) and need to be shown on separate plots.

income_dat = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/US_household_income_2017.csv", encoding='latin-1')
income_dat.head(3)
id State_Code State_Name State_ab County City Place Type Primary Zip_Code Area_Code ALand AWater Lat Lon Mean Median Stdev sum_w
0 1011000 1 Alabama AL Mobile County Chickasaw Chickasaw city City place 36611 251 10894952 909156 30.771450 -88.079697 38773 30506 33101 1638.260513
1 1011010 1 Alabama AL Barbour County Louisville Clio city City place 36048 334 26070325 23254 31.708516 -85.611039 37725 19528 43789 258.017685
2 1011020 1 Alabama AL Shelby County Columbiana Columbiana city City place 35051 205 44835274 261034 33.191452 -86.615618 54606 31930 57348 926.031000
income_dat = income_dat[~income_dat["State_Name"].isin(["Alaska", "Hawaii", "Puerto Rico"])]
income_dat = income_dat[income_dat["Mean"] > 0]
mean_US = income_dat["Mean"].describe()["mean"]
mean_US
67719.94988293362

Map of the US states#

# Create geocoder for the 48 contiguous states.
state_gcoder = geocode_states("US-48")
state_gcoder.get_geocodes().head(3)
id state found name centroid position limit
0 60759 Vermont Vermont [-72.772353529363, 43.8718488067389] [-73.4377402067184, 42.7269606292248, -71.4653... [-73.4377402067184, 42.7269606292248, -71.4653...
1 61315 Massachusetts Massachusetts [-72.0964509339039, 42.1913791447878] [-73.5082098841667, 41.4945808053017, -69.9292... [-73.5082098841667, 41.2393619120121, -69.9292...
2 61320 New York New York [-76.0912327538441, 42.8993669897318] [-79.7619438171387, 40.7823456823826, -73.2414... [-79.7619438171387, 40.4960802197456, -71.8561...

Simple blank map#

ggplot() + geom_map(map=state_gcoder)

Choropleth map - states#

# Compute mean income by the US state.
mean_income_state = income_dat.groupby("State_Name", as_index=False)["Mean"].mean()
mean_income_state.head(3)
State_Name Mean
0 Alabama 54023.752874
1 Arizona 63400.114943
2 Arkansas 52213.932153
# Define some setting to use on plots later on:
#
# - A gradient color palette. We will borrow color codes from the Brewer's 'PiYG' palette:
#   https://colorbrewer2.org/#type=diverging&scheme=PiYG&n=11
#   We will be using the US mean income as a `midpoint` for the color scale.
map_fill_colors = scale_fill_gradient2(low="#8e0152",mid="#f7f7f7",high="#276419", midpoint=mean_US)

# - Remove axis.
# - Define plot coordinate system and size.
map_settings = (theme_classic() + theme(axis='blank') + 
                map_fill_colors + 
                coord_map() +
                ggsize(900, 400))
# Use `geom_polygon` to create choropleth.
# - pass state geocoder to the `map` parameter.
# - specify the "State_Name" variable (from the dataset) as a single key in the `map_join` parameter.
(ggplot(mean_income_state) + 
 geom_polygon(aes(fill="Mean"), map=state_gcoder, map_join="State_Name") + 
 map_settings)

Adjusting geocoder resolution.#

Plot of this size looks too pixelated with the resolution used by default.

To create a better looking choropleth use the inc_res() function.

We will also configure a better looking tooltips.

tooltip_state=(layer_tooltips()
          .format('Mean', '.2~s')
          .line('@State_Name')
          .line('Mean income|$@Mean'))

(ggplot(mean_income_state) + 
 geom_polygon(aes(fill="Mean"), 
              map=state_gcoder.inc_res(), 
              map_join="State_Name", 
              tooltips=tooltip_state) + 
 map_settings)

Choropleth map - counties#

# Compute mean income by the US county.
# Note: the resulting dataframe two key variables: "County" and "State_Name".
#       Later we will use these two variables to 'join' this dataframe with counties geocoder data.
mean_income_county = income_dat.groupby(["State_Name","County"], as_index=False)["Mean"].mean()
mean_income_county.head(3)
State_Name County Mean
0 Alabama Autauga County 54086.006522
1 Alabama Barbour County 37725.000000
2 Alabama Blount County 55127.000000
# Create geocoder for the US counties.
# Note: in addition to county names we are using here the `states()` function.
#       The `states()` allows us to tell geocoder to use names of states as parent qualifiers for county names.
#       This is necessary because names counties in the US are not unique, i.e. different states can easy have 
#       counties with identical names.
county_gcoder = (geocode_counties(mean_income_county["County"])
    .states(mean_income_county["State_Name"])
    .ignore_all_errors())
county_gcoder.get_geocodes().head(3)
id county found name state centroid position limit
0 1848758 Autauga County Autauga County Alabama [-86.6511697368103, 32.5077094137669] [-86.9211257994175, 32.3075589537621, -86.4111... [-86.9211257994175, 32.3075589537621, -86.4111...
1 1850797 Barbour County Barbour County Alabama [-85.3935062819253, 31.8834118545055] [-85.7483513653278, 31.6182379424572, -85.0488... [-85.7483513653278, 31.6182379424572, -85.0488...
2 1848761 Blount County Blount County Alabama [-86.5330419226251, 34.0133339166641] [-86.9634309411049, 33.7653543055058, -86.3030... [-86.9634309411049, 33.7653543055058, -86.3030...
# Configure tooltip.
tooltip_county=(layer_tooltips()
          .format('Mean', '.2~s')
          .line('@County, @State_Name')
          .line('Mean income|$@Mean'))
# Again, use `geom_polygon` to create choropleth.
# - pass county geocoder to the `map` parameter.
# - specify the "County" and "State_Name" variables as a hieratchical key in the `map_join` parameter.
#   Note: the order of keys in hierarchical key is important.
#         Lets-Plot expects the same order as it is in a US street address, i.e.: city, state, country. 
(ggplot(mean_income_county) + 
 geom_polygon(aes(fill="Mean"), 
              map=county_gcoder, 
              map_join=[["County", "State_Name"]], 
              tooltips=tooltip_county, 
              size=0.1) + 
 map_settings)

Interactive map#

Finally, let’s add the geom_livemap() layer to create an interactive map which you can zoom in and out, and pan.

Note: nbviewer won’t load base-map tiles due to its CSP.

(ggplot(mean_income_county) + 
 geom_livemap() +
 geom_polygon(aes(fill="Mean"), 
              map=county_gcoder, 
              map_join=[["County", "State_Name"]], 
              tooltips=tooltip_county, size=0.1) + 
 map_settings)