Category: Information

The genius of using ST_Subdivide to speed up PostGIS intersection comparisons

You should use ST_Subdivide to break up large shapes into smaller ones

This infographic visually compares the difference between running a PostGIS comparison query like ST_Intersects on a large shape versus a subdivided version of that large shape. Click to embiggen.

Hundreds of GIS intersection comparisons are completed every hour on Chicago Cityscape.*

People are looking at, say, a map of the South Shore community area. That “Place” page then grabs all of the building permits, building violations, business licenses, and other “feature layers” that are stored as points.

A classic “point in polygon” comparison is made using the ST_Intersects(place_geometry, permits_geometry) function.

This has worked very well for several years.

The problem

But as Chicago Cityscape handles larger shapes – they come from users drawing their own, large shapes, and from large shapes like the downtown Chicago area – this query doesn’t cut it.

Setting indexes on the geometry is imperative, but it’s not the end of the to optimize performance. That’s because the index of the geometry is a rectangular bounding box (which is also called an “envelope” in GIS) that contains the entire shape of the South Shore community area.

The downtown Chicago area, however, is not even the largest shape I have. That belongs to the new Place, “Neighborhood Opportunity Fund investment zones” (NOF). Combined, they cover 75 square miles of Chicago. Downtown is only 7.7 square miles.

After I added the NOF map and tested its Place page, it “crashed” my server, metaphorically speaking. The query to just count the number of building permits in the area would take about five minutes.

There had to be a better way; in the meantime, however, I divided the NOF map into the West and South sections. This hardly improved the counting time.

The solution

Thankfully, today, I saw a tweet from Paul Ramsey linking to his blog that linked to his slides from a recent presentation about the use of PostgreSQL to store and manipulate GIS data.

In it he explained how the ST_Subdividefunction worked. I’m going to demonstrate it using graphics from my own maps.

A normal intersection comparison, using ST_Intersects(place_geometry, permits_geometry) in a query creates a bounding box (envelope) around each geometry and quickly determines whether the two envelopes overlap. If they do, then it checks again to see if the actual geometries overlap. If they do, that data is returned as a response to your query.

When your two datasets are massive, like the NOF zones, which collectively cover 1/3rd of Chicago, and the building permits, which are found across the entire city…well, that led to the five minutes counting time.

Enter ST_Subdivide. To use it properly you would run it against your existing geometry and store the much smaller shapes, derived from the big shape, in a new table. I applied the function to all the 22,203 maps that Chicago Cityscape has and stored their unique IDs and subdivided geometries in a new “lookup” table.

Now, any time I want to compare the building permits against the NOF, the building permits are instead compared to the small shapes that were subdivided.

The query

Chicago Cityscape uses a single table (created as a materialized view) to combine all 22,203 maps. Each map is stored in a source table (for example, there’s a table to hold the 77 community areas) and the materialized view runs once a day to combine all of the maps in the source tables. This ensures our data is managed well: different source tables can hold different information, and the single table holds only the name, type, and geometry of the source tables, for faster comparison. Each entry in the single table also has a “slug”, its unique identifier.

Thus, the materialized view of the subdivided maps is created from the aforementioned single table, using this query:

create materialized view view_places_subdivided as
select gid || '_' || random() as gid, slug, st_subdivide(geom) as geom
from view_places;

The “gid” is designed to create a new unique ID field, as the slug field will be repeated for every subdivided of each map. A unique ID field is necessary if you want to refresh the materialized view concurrently (to allow for other queries to access the materialized view while it’s being refreshed).

* The results are cached for a few hours, because the feature layers change 1-2 times per day and at different times each day, so the limited duration cache accommodates that. Ideally I would code a way to invalidate the cache when the feature layer data is updated.

Update 12/31/19: ST_Subdivide will fail if your geometries have any or certain geometry errors (I don’t know if it’s any kind of error, or certain kinds of errors that make the function fail). Chicago Cityscape has over 37,000 features that ST_Subdivide is attempting to process, and there is a lot of room for error in managing that many features from dozens of sources.

Upzone the 606

Map of the single family-only zoning around the Bloomingdale Trail

The area in green only allows single-family houses to be built.

Something’s gotta give.

This is all of the land area within two blocks of the Bloomingdale Trail that allows only single-family housing to be built (view full map). This isn’t to say that multi-family housing doesn’t exist here; it definitely does, and there’s probably a handful of two-flats on a majority of the blogs.

All of the five parks of the 606 are within this two block radius, and 49.6 percent of the land allows only single-family housing to be built.

But why build a transportation corridor, a park, a new, expensive, public amenity, and not change the kind of housing – which often determines the kind of family and makeup of a household – that can afford to buy a home near here.

It’s already been shown that detached single-family housing prices have grown intensely the closer you get to the trail. That price growth has meant displacement for some, and “no chance to buy or build a house here” for many others.

There are still plenty of vacant lots within the mapped area; lots that should have a 2-4 unit building built on them, but where only a 1-unit building is allowed.

This map was made possible by the new Zoning Assessment tool on Chicago Cityscape. Read about it or use it now.

How much of the land within walking distance of a CTA station is zoned to allow multi-family housing?

I recently created the Zoning Assessment tool on Chicago Cityscape, which shows a map of aggregated zoning districts in a given community area, ward, or near a CTA or Metra station. Per Paul Angelone’s suggestion, you can now show the walk shed – the area within walking distance to the station, following the streets.

The maps in this post show where one can build apartments (including a simple and common two-flat) within a 15 minute walk to the Logan Square Blue Line station, which has 24-hour service. Try it yourself.

Thirty-one percent of the walk shed allows multi-family housing.

In a second version of the same map, I’ve marked in red the gaps in the zoning map. These are areas that are zoned to allow only single-family housing. That doesn’t make sense: The land near rapid transit stations should be much denser than the land away from the stations.

Sixty-four percent of the walk shed allows only single-family housing. The remaining five percent are planned developments (at least the Mega Mall is going to have a couple hundred dwelling units), manufacturing, and parks.

And if most of the block is already zoned to allow multi-family housing, why are these parcels skipped?

This is the same map as the one above, but with areas that allow only single-family housing marked in red (however, I skipped some areas to save myself time).

How it works
The walk shed boundaries are generated by Mapzen’s isochrone service. The Zoning Assessment map asks Mapzen for the polygon of a specified walk shed (walk or bike, 10 or 15 minutes), receives the polygon and sends that polygon to a custom API on the Cityscape server, which compares that to the server’s copy of Chicago’s latest zoning map. The comparison is then returned to the browser and replaces the default Zoning Assessment map.

Download GIS data from ArcGIS servers using a simple command

A lot of geospatial data (GIS) in the world is stored on ArcGIS MapServers, a part of the Esri “stack” of products that municipalities and other governments use to manage and publish GIS data.

A lot of people want that data, and not every government has enabled the ArcGIS data hub features that make it easier to get. For an example data hub, check out Cook County, Illinois.

If you have ArcGIS software on your Windows computer, then it can be pretty easy to plug in the map server URL and manipulate and extract the data.

For the rest of us who don’t have a very expensive license for that software, you can use a “command line” tool (written in Python) on any computer to download any layer of GIS data hosted on the ArcGIS MapServer and automatically convert it to GeoJSON.

Tutorial

You’ll need to install the Python package pyesridump, from the OpenAddresses GitHub repository, created by Ian Dees and other contributors.

Installing pyesridump is easy if you have pip installed, using the command pip install esridump.

The next thing you’ll need is the URL to a layer in a MapServer, and these are not easy to find.

Finding data to download

I can almost guarantee the county where you live has one. Before you continue, check to see if your county (or other jurisdiction) has the “open data portal” add-on to their ArcGIS stack.

A handful of suggestions:

  • Here are links to the open data portals enabled by Esri for Lake County, Illinois, and Broomfield County, Colorado.
  • A reader suggested looking for MapServers for jurisdictions around the world by looking through Esri’s portal of open data called ArcGIS Hub.
  • Use a search engine to search for the URL pattern that’s common to ArcGIS servers: paste “inurl:arcgis/rest/services” into a search engine. (You can also limit this search to a particular domain name, say, “chicago.gov”, by adding “site:chicago.gov” to the search query, making the full search query “inurl:arcgis/rest/services site:chicago.gov”.

Once you locate a dataset you want, you can find the MapServer URL under About>Data Source on the right side of the page.

I normally find them by looking at the HTML source code of a MapServer I already know about.

For this example I’ll use one of the GIS layers in the Cook County, Illinois, election service MapServer – here’s the layer for the Cook County commissioners districts.

Fetch the data

Once you have the URL the command is simple:

esri2geojson http://cookviewer1.cookcountyil.gov/ArcGIS/rest/services/cookElectnSrvc/MapServer/11 cookcounty_commissioners.geojson

  • The first term, esri2geojson tells your computer which program to load.
  • The second term is the URL of the MapServer URL.
  • The third term is the filename and location where you want to store the file. I prefer running the command “inside” the folder where I want the file to be stored. You can also specify a full path of the file. On a Mac this would look like ~/Users/username/Documents/GIS/projectname/cookcounty_commissioners.geojson

After you enter the command into your computer’s terminal, press enter. esri2geojson will report back once, after it finds and understands the MapServer URL you gave it. When it’s done, the command will “close” and your computer’s terminal will wait for the next command.

Do you have questions, or need some help? Leave a comment below.

Fun with stats: Building permits by street name and number edition

John Hancock Center

The John Hancock Center. Photo by Kevin Dickert.

 

On which street are the most building permits issued?

Michigan Avenue!

But where on Michigan Avenue are the most building permits issued?

Take a guess!

First, can you answer: Are most building permits issued to North Michigan Avenue (between Madison Street, 0 north/south, and Oak Street, 1000 north), or South Michigan Avenue (between Madison Street, 0 north/south, and um, somewhere south of 130th Street, 13000 south)?

Here’s the answer…

Even though South Michigan Avenue is at least 13x longer than North Michigan Avenue, South Michigan Avenue has 39 percent fewer building permits!

From 2006 to yesterday (Saturday), there were 7,828 building permits issued to projects on North Michigan Avenue and 4,714 building permits issued on South Michigan Avenue.

The most common address on North Michigan Avenue to receive building permits was 875 N Michigan Avenue. It’s also the most common address to receive building permits on all Chicago streets.

What’s there? The John Hancock Center (tower)!

The average building address number on North Michigan Avenue is 540.6. That means that building permits on North Michigan Avenue concentrate around Grand Avenue, which is near the city’s biggest Marriott hotel, and is where the Under Armor flagship store is.

The next most common street – after South Michigan Avenue – is North Clark Street, which extends from Madison Street (0 north/south) to the northern edge of the city at Howard Street, which is 7600 north, about 7.6 times longer than North Michigan Avenue.

S. Clark Street Signs

Businesses in the 400 block of South Clark Street, as of when the photo was taken in November 2008. I believe the hotel is still there. This is the busiest block of South Clark Street, for building permits. Photo by Bruce Laker.

South Clark Street doesn’t register in the top 10 or even the top 100. It comes it at number 162, with 772 building permits. This is surprising to me because South Clark Street runs from Madison Street (0 north/south) in downtown and goes to 2200 south, and has a lot of downtown office buildings.

South LaSalle Street (3,613 building permits), South Wabash Avenue (2,916), and South Dearborn (1,611) are all in the top 50. The data could be wrong somehow.