Tag: open data

Chicago Police responds to my FOIA request about bicycle theft

A Chicago Police (CPD) officer called me this morning to discuss my FOIA request for bike theft data. It was very revealing.

The first problem is that I forgot to ask for a time frame. No big deal, I can tell him over the phone that I want the last three full calendar years.

The second problem is that there’s not a separate code for recording bicycle thefts. It’s recorded under “Simple Theft” and as being under $300 or over $300.

Third problem is that the database front end (the graphical interface that allows officers to search the reporting database) doesn’t allow him to search all of the report narratives for “bike” or “bicycle” and limit the search to “Simple Theft” in a specific time frame. Some report codes allow narrative searching, and some don’t. He said it would be impractical to search all narratives for the words “bike” or “bicycle” because a lot of reports not about theft would appear in the results.

In my last blog on the Chicago Police Department’s FOIA response (for my request about bike crashes), they explained that they don’t have to create records that don’t already exist (like a list of bike thefts). This response is identical, but they called and gave me a better explanation. The officer also said they don’t have the staff resources to spend on collating their records for bicycle theft reports. I understand this.

He also explained that reporting standards at the CPD are guided by the Federal Bureau of Investigation (FBI) in the department’s “Incident Reporting Standards.” In the FBI’s reporting standards, there exists a line item for “bicycle theft” but it’s the same code as “simple theft.” There are separate codes for “credit card theft” and “motor vehicle theft.”

It seems the solution to the problem of obtaining records on bike theft in Chicago is to update the Incident Reporting Standards and include a new code for bike theft reports. At the end of the call, I understood that I was not going to get a list of bike theft dates and times from the police.

For now, Chicagoans should also report their bicycle theft to the Stolen Bike Registry so there’s a publicly available record of theft locations.

A Chicagoans rides his bike north on Halsted Street through University Village. If his bike is stolen, we can’t expect the Chicago Police Department to keep an easily findable report of it.

Reminder about open data and Obama’s Open Government Directive

Quickly after taking office, President Obama issued a memorandum about open government and opening government data. Then came the Open Government Directive* which said:

To the extent practicable and subject to valid restrictions, agencies should publish information online in an open format that can be retrieved, downloaded, indexed, and searched by commonly used web search applications. An open format is one that is platform independent, machine readable, and made available to the public without restrictions that would impede the re-use of that information.

Essentially, the executive government (er, Obama Administration) adopts the presumption of openness, that distributing public data is the default position and action to take.

Don’t squat on the data. Don’t fret over how people will view or manipulate the data – this is not your concern. Don’t delay its release. If you do this, you are a frigid dataist and I will remember this.

Photo of visual note taking at an open data seminar by Karen Quinn.

*The Directive has a little more backbone than the original memorandum: “This memorandum requires executive departments and agencies to take the following steps toward the goal of creating a more open government.”

Thank you to Tech President.

Free online GIS tools: An introduction to GeoCommons

Read my tutorial on how I created the pedestrian map with GeoCommons. Read on for an introduction to GeoCommons and online GIS tools.

GeoCommons, like Google My Maps and Earth, is part of the “poor man’s GIS package.” It’s another tool that provides (few) of the functions that desktop GIS software offers. But it excels at making simple and somewhat complex maps.

I first used GeoCommons over a year ago. I started using it because it would convert whatever data you uploaded into another format that was probably more useful. I mentioned it in this article about converting files. For example, if you have a KML file, you can upload it and export it as a shapefile for GIS programs, or a CSV file to load into a table editor or spreadsheet application.

After creating the Chicago bike crash maps using Google Fusion Tables, I wanted to try out another map-making web application, one that provided more customization and prettier maps.

I found that web application and created a version of the bike crash maps, with several other data layers, in GeoCommons. I overlaid bike counts and bikeways so you can observe some relationships between each visual dataset. My latest map (screenshot below), created Wednesday, shows pedestrian counts in downtown Chicago overlaid with CTA and downtown Metra stations, as well as the 48 intersections with the most pedestrian collisions (from this UNC study, PDF).

Screenshot of pedestrian count map described above.

How these online GIS tools can be useful to you

I bet there’s a way you can use Google Fusion Tables and GeoCommons for your job or project. They’re extremely simple to use: they can take in data from the spreadsheets you’re already working on and turn them into themed reference maps. With mapping, you can do simple, visual analysis that doesn’t require statistical software or knowledge.

Imagine plotting your client list on a map and grouping them by age to see if perhaps your younger clients tend to live in the same neighborhoods of town, or if they’re more diverse (should you do this, keep the map private, something that you can’t do in GeoCommons – yet).

You may also find it useful if you want to create a route for your salespeople or for visiting church members at their homes. Plot all the addresses on a map, then manually filter them into different groups based on the clusters you see. With Google Fusion Tables, you can easily add a new column with the GROUP information and apply a numbered or lettered group and then re-sort.

Other things you can do in GeoCommons

  • Merge tables with geography – I uploaded two datasets: a table containing census tract IDs and demographic information for Cook County I downloaded from the American FactFinder 2; and a shapefile containing Cook County census tracts boundary information. After merging them, I could download a NEW shapefile that contained both datasets.
  • Make multi-layer maps
  • Symbolize based on frequency/rate
  • Convert data – This is by far the most useful feature. It imports “shapefiles (SHP), comma separated values (CSV), Keyhole Markup Language (KML), and GeoRSS” and exports “Shapefile, CSV, KML, GeoRSS Atom, Spatialite, and JSON” (from the GeoCommons user manual).

Read my tutorial on how I created the pedestrian map with GeoCommons.

How to create a map in GeoCommons

GeoCommons (GC) is like Google My Maps but more powerful. Read my introduction to GC.

Tips before starting

  • With GC, I’m still figuring out what I must decide before I choose to add or amend something and what I can edit after I’ve made a change.
  • You cannot edit the data table directly.
  • You CAN replace data – click “reupload” – but the columns must match between original and replacement data.
  • Click Save often when making the map. You never know when Adobe Flash is going to quit on you.

One of the busiest locations in Chicago, for people walking, or riding buses and trains. Also a lot of taxi traffic and medium bike traffic. At Adams Street and Riverside Plaza (er, the Chicago River).

Tutorial

  1. Prepare your data.”We support Spreadsheets (as CSVs), Shapefiles, KML, RSS, ATOM and GeoRSS. We also support WMS and Tile services!” GeoCommons has instructions on how to prepare your spreadsheets for geocoding (if not already geocoded; GC will also work with predefined XY coordinates or street addresses). Ensure fields holding numbers have their type set as numeric in the GIS or spreadsheet program or you may run into roadblocks later on when trying to analyze these fields.
  2. If uploading a shapefile, GC requires the SHX and DBF files as well. The PRJ file will also help GC know how to reproject your data on the fly. GC base layer maps are projected in WGS84, just like Google Maps. Without the PRJ file, your data may not show. [Can the user set projection?]
  3. Upload data.
  4. You need to turn your newly uploaded data from a “pending dataset” to a completed dataset. In this process you will tell GC a little more about your data, including which columns hold the XY coordinates (even though it guesses this). you can also change the attribute names and describe the content of those attributes (you can also change this later).
  5. So click “Next Step” to start this process.
  6. In the “Review Your Geodata” step, you may see that GC has found some additional columns in your dataset. I’m not sure why this is. Delete these columns by selecting the header and clicking Delete Column. Then click Save Changes. You can select multiple columns at a time by holding the Command (Mac) or Control (Windows) keys.
  7. Add metadata; edit attribute names and add descriptions.
  8. You’re done. GC will present you a page with statistics and options to download your data in different formats.
  9. If you want to make a map with more data, follow the process again starting at Step 1. If not, continue.
  10. Make a map! Click “Map Data” or the “Make a Map” button in toolbar.
  11. A map of the world will load. When GC has finished loading your “new layer,” the map will zoom in.
  12. For the pedestrian map, I want to symbolize the data with a single color but changing the size of the circle based how many people were counted there (your data must have this attribute in numeric form – if it doesn’t you may have to reupload your data). Click “Add Data” and then in the Map Brewer box that appears by:
    1. Click on Visual Theme. Click next.
    2. Select the NUMERIC attribute. In the pedestrian data, this is “count.”
    3. Then select whether or not you want colors or sizes. You can not change this later. You would just delete the layer and add the layer again (using your already uploaded dataset).
    4. Select what type of classification you want. This is entirely up to you and how you want the map to look and based on what data you have. You can change this later.
    5. Choose your shape and color.
  13. Add more data by clicking Add Data button. I think my map would be more useful and interesting if it also showed where the train stations are, a major destination category for people who walk downtown on weekdays. I will symbolize by a solid color. Instead of visual theme, which I chose for the ped counts, I will just choose Points, Lines & Areas. At this time, GC doesn’t allow custom icons.
  14. Re-order layers by dragging them up and down in the layers box. Click on the boxy “handle” to the left of the layer.
  15. Change the layer names by single clicking on the layer name. Press Enter when you’re done.
  16. Change the map name by singe click on it. Press Enter when you’re done.

After creating my pedestrian map, I had some suggestions for GeoCommons, the people who collected the pedestrian count data, and my own map.

  • GeoCommons should add a map preview image for better sharing on Facebook and other websites that look for this.
  • GeoCommons should allow maps to be private after creation – I think after you click save, they are added to a gallery (I could be wrong).
  • The data collectors should add more locations, particularly around Union Station and the two Clinton CTA stations (also between CTA and Metra stations).
  • The data collectors should add “date collected” to the data table
  • The data collectors should extend survey hours to better match commuting patterns. A majority of the collections end at 5:45 PM while Metra’s rush hour ends just before 7 PM (this is when train departure frequency drops).
  • I should add ridership data to the train stations so we can see which CTA and Metra stations are most used.

How to convert GTFS to GIS shapefiles and KML

This tutorial will teach how you to convert any transit agency’s General Transit Feed Specification (GTFS) data into ESRI ArcGIS-compatible shapefiles (.shp), KML, or XML. This is simple to do because GTFS data is essentially a collection of CSV (comma separated values) text files (really, really large text files).

Note: I don’t know how to do the reverse, converting shapefiles or other geodata into GTFS data. I’m not sure if this is possible and I’m still investigating it. If you have tips, let me know.

Converting GTFS to GIS shapefiles

Instructions require the use of ArcGIS (Windows only) and a free plugin called ET GeoWizards GIS for any version of ArcGIS. I do not have instructions for Mac users at this time.

I wrote these instructions while converting the Chicago Transit Authority’s GTFS files into shapefiles based on a reader’s request. “Field names” are quoted and layer names are italicized.

  1. Download the GTFS data you want. Find data from agencies around the world (although not many from Europe) on GTFS Data Exchange.
  2. Import into ArcGIS the shapes.txt file using Tools>Add XY Data. Specify Y=lat and X=lon
  3. Using ET GeoWizards GIS tools, in the Convert tab, convert the points shapefile to polyline.
  4. Select the shapes layer in the wizard, then create a destination file. Click Next.
  5. Select the “shape_id” field
  6. Click the checkbox next to Order and select the field “shape_pt_sequence” and click Finish.
  7. Depending on the number of records (the CTA has 466,000 shapes), it may take a while.
  8. The new shapefile will be added to your Table of Contents and appear in your map.
  9. Import the trips.txt and routes.txt files. Inspect them for any NULL values in the “route_id” field. You will be using this field to join the routes and trips table. It may be a case that ArcGIS imported them incorrectly; the text files will show the correct data. If NULL values appear, follow steps 10 and 11 and continue. If not, follow steps 10 and 12 and continue. This happens because ArcGIS inspected some of the data and determined they were integers and ignored text. However, this is not the case.
  10. Export the text files as DBF files so that ArcGIS operates on them better. Then remove the text files from the Table of Contents.
  11. (Only if NULL values appear) Go into editing mode and fix the NULL values you noticed in step 9. You may have to make a new column with a more forgiving data type (string) and then copy the “route_id” column into the new column. Then continue to step 12.
  12. Join routes and trips based on the field “route_id” – export as trips_routes.dbf
  13. Add a new column to shapes.shp called “shape_id2”, with data type double 18, 11. This is so we can perform step 14. Use the field calculator to copy the values from “shape_id” (also known as ET_ID) to “shape_id2”
  14. Join routes_trips with shapes into routes_poly based on the field “shape_id” (and “shape_id2”)
  15. Dissolve routes_poly on “route_id.” Make sure all selections are cleared. Use statistics/summary fields: “route_long,” “route_url.” Save as routes_diss.shp
  16. Inspect the new shapefile to ensure it was created correctly. You may notice that some bus routes don’t have names. Since these routes are well documented on the CTA website, I’m not going to fill in their names.

Click on the screenshot to see various steps in the tutorials.

Converting GTFS to KML

After you have it in shapefile form, converting to KML is easy – follow these instructions for using QGIS. Or if you want to skip the shapefile-creation process (quite involved!), you can use KMLWriter, a Python script. Also, I think the latest version of ArcGIS has built-in KML exporting.

Converting GTFS to XML

If you want to convert the GTFS data (which are essentially comma-separated value – CSV – files) to XML, that’s easier and you can avoid using GIS programs.

  • First try Mr. Data Converter (very user friendly).
  • If that doesn’t work, try this website form on Creativyst. I tested it by converting the CTA’s smallest GTFS table, frequencies.txt, and it worked properly. However, it has a data size limit. (User friendly.)
  • Next try csv2xml, a command line tool. (Not user friendly.)
  • You can also use Microsoft Excel, but read these tips and caveats first. (I haven’t found a Microsoft application I like or think is user friendly.)