Maps with PostgreSQL and PostGIS
Learn how to stream geodata from PostGIS to your browser
This blog post explains to you which tools to use to serve geospatial data from a database system (PostgreSQL) to your web browser. All you need is a database server for the data, a web map application for the frontend and a small service in between to transfer user requests. I will also show you how these components can run on top of Kubernetes in a highly available cloud native fashion.
PostGIS - a spatial database
As a first step the dataset in your database you want to put on a map must include a geospatial representation: Two coordinates or an address. For Zalando it might be interesting to know the demand hotspots across Europe e.g. by joining the zip codes of shipments with administrative boundaries which are often available as Open Data. The database must support geo data types and indexes to answer spatial queries. At Zalando, the open source database system PostgreSQL is used by many teams and it offers a geospatial component called PostGIS. It is used for example to allow our customers to select the nearest pickup and return points. Over the years, PostGIS has grown a strong community and is widely accepted in the industry as the de facto standard to manage geospatial data. There are many different tools and interfaces available to import data in various formats into PostGIS and access it from your favorite data science environment - be it Jupyter, R or Tableau.
Bring the map to your browser
Creating a web mapping app is simple with tools like Leaflet.js. For the basemap we can use OpenStreetMap, the wiki-style free alternative to commercial map providers. Adding extra layers with e.g. over 100,000 polygons on top of it would slow down map navigation a lot. Splitting the data into a grid of tiles and loading only the ones of the area you are currently looking at on your screen is what makes a browser map fast and responsive. Until recently, a middleware was usually required to produce these tile structures. That middleware had to consider not only the grid creation, but also take care of different zoom levels. When you zoom out the geometry of streets, rivers, forests etc. should be coarser and styled differently - some details should be even left out at a smaller scale for the sake of readability.
The good news is, these days PostGIS can take over most of the middleware’s job and produce map tiles for you. You only need a lightweight server between the frontend that takes in requests from the map and sends queries to your spatial database to produce the tiles you want. pg_tileserv is such a solution. You configure the table name that contains the spatial data and that’s it. If you want to learn more about vector tiles I can recommend this talk by Paul Ramsey, one of the PostGIS authors.
Running it on Kubernetes
The Postgres Operator, created by my team at Zalando, provides you with an easy creation and update path for PostgreSQL servers running on top of Kubernetes. Engineers only have to write a short YAML manifest which can look like this:
apiVersion: acid.zalan.do/v1
kind: Postgresql
metadata:
name: acid-geo
spec:
numberOfInstances: 2
postgresql:
version: "14"
volume:
size: 10Gi
teamId: acid
preparedDatabases:
map_db:
defaultUsers: true
extensions:
postgis: geo
schemas:
geo: {}
The operator will notice the new manifest and create all the necessary resources in Kubernetes - a stateful set with 2 database pods, services to connect to the database, secrets for authentication etc.. With specifying preparedDatabases the operator will create a new database with schemas as well as a set of database roles (reader, writer, owner) with default access privileges assigned. Plus, you can list extensions to be created in a certain schema. The Postgres cluster is based on the Spilo docker image which includes the PostGIS extension.
To import arbitrary geodata formats I can recommend GDAL’s ogr2ogr command-line tool. In my case I’ve imported the latest European NUTS polygons of 2021 and the 1km² population grid of 2018 by Geostat.
To roll out pg_tileserv
on Kubernetes I’m using a deployment resource. To run it within the Zalando infrastructure I had to move the tileserver base path behind our oauth2 proxy with a dedicated /tileserver
base path which required me to overwrite pg_tileserv
’s default configuration. Configuration of pg_tileserv
happens via toml files so I’ve put that into a config map and mounted it into the container. Here you can see the manifest (leaving out the resources section in this example):
apiVersion: v1
kind: ConfigMap
metadata:
name: acid-geo-tileserver-config
data:
pg_tileserv.toml: |
BasePath = "/tileserver/"
Debug = true
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: acid-geo-tileserver
spec:
replicas: 1
selector:
matchLabels:
application: acid-geo-tileserver
template:
metadata:
labels:
application: acid-geo-tileserver
spec:
containers:
- name: acid-geo-tileserver
image: pramsey/pg_tileserv:latest
ports:
- containerPort: 7800
protocol: "TCP"
volumeMounts:
- name: configs
mountPath: /config
env:
- name: "DATABASE_URL"
value: postgresql://map_db_reader_user@acid-geo:5432/map_db
- name: "PGPASSWORD"
valueFrom:
secretKeyRef:
name: map_db_reader_user.acid-geo.credentials
key: password
volumes:
- name: configs
configMap:
name: acid-geo-tileserver-config
Another deployment is needed serving our Leaflet application, e.g. using a simple Ubuntu docker image with nginx running.
Dynamic mapping layers
The web map requests tiles from pg_tileserv
which sends back protobuf files. In our case, a request looks like this - with geo.boundaries_europe
being the schema qualified table name:
${BASE_URL}/tileserver/geo.boundaries_europe/{z}/{x}/{y}.pbf
Z is the zoom level and X and Y are the coordinates of the mouse cursor. Leaflet’s VectorGrid class can be used to display the vector tiles returned from PostGIS. For the boundaries the result can look like in the first picture above. The vector tile format must not consist solely of the geometry. Multiple thematic attributes can be included making it possible to change the style on the fly without sending another request to the database. pg_tileserv
will take information from all columns it finds in a spatial table.
Alternatively, it allows me to serve vector tiles not only from a table but also from an SQL function using a query with PostGIS’ vector tile creator function ST_AsMVT. pg_tileserv
’s README on GitHub provides some cool examples for such function layers. For example PostGIS allows you to create a grid of squares or hexagons within a defined extent, e.g. the envelope of a single tile. The grid can be intersected with another spatial data set to produce a heatmap. The following example is inspired from pg_tileserv
's example of Advanced Function Layers.
CREATE OR REPLACE FUNCTION geodata.population_hexagons(
z integer, x integer, y integer,
step integer default 4)
RETURNS bytea AS
$$
WITH bounds AS (
-- get web mercator tile bounds to given coordinate
SELECT ST_TileEnvelope(z, x, y) AS geom
), hexes AS (
-- generate hexgrid within bounds and join with population grid
SELECT row_number() OVER () AS grid_id,
h.geom, h.i, h.j,
sum(p.popcount) * 0.5 AS popcount -- oversimplified, of course
FROM bounds b
JOIN LATERAL ST_HexagonGrid( -- 1. hex size, 2. boundary
(ST_XMax(b.geom) - ST_XMin(b.geom)) / pow(2, step), b.geom
) h ON (true)
-- do spatial join between our artificial grid and the Geostat grid
-- the hex grid is in web mercator coordinate reference system (CRS)
-- it must be tranformed into the same CRS of the population grid (WGS84 - 4326)
JOIN geodata.population p
ON p.geom && ST_Transform(h.geom, 4326)
GROUP BY h.geom, h.i, h.j
), mvt AS (
-- processing geometry for vector tiles
SELECT ST_AsMVTGeom(h.geom, b.geom) AS geom,
(h.i::text || h.j::text || h.grid_id::text)::int AS grid_id,
H.popcount
FROM hexes h, bounds b
)
-- baking mvt geom, grid_id and popcount into MVT encoding
SELECT ST_AsMVT(mvt, 'geodata.population_hexagons') FROM mvt;
$$
LANGUAGE 'sql' STABLE STRICT PARALLEL SAFE;
Your function must take the Z, X and Y parameters as arguments and return Postgres' bytea
type, which is just a BLOB for the PBFs returned from ST_AsMVT. In the first part of the query we need to get the tile envelope for the given input. Within this square we generate the grid and join it against the Geostat population grid. For each hexagon we sum up the population of every intersecting Geostat grid cell. This is quite coarse, indeed. It would be more precise to join the generated grid against a point data set, e.g. one could generate centroids for each data polygon.
Because this is all based on database queries triggered from user interactions with the map, such a heatmap can be dynamic and change while zooming in and out. As the vector tile grid gets smaller on a larger scale the heatmap becomes more fine-grained. In the map legend you can see that values adapt to the zoom level and hexagon size. This is much better for the perception by not overwhelming the observer when the full picture is shown and providing better guidance to points of interests.
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Data Engineer!