Tag: open data

Converting shapefiles to GeoJSON, and other format conversions

To develop the Chicago Bike Map app, I had a problem I thought would be simple to solve: load train lines into a Leaflet-powered map. I had the train lines stored as a polyline shapefile but Leaflet can only read the GeoJSON format or a string of geographic coordinates representing lines.

I eventually found a solution (I can’t remember how) and I need to share it with you. The converter can do more than ESRI shapefiles to GeoJSON. It can reproject the data in the conversion. It can convert from several formats to several other formats.

The site is called MyGeodata Converter. You upload a ZIP file of geographic files – .shp and its companion files (.prj, .dbf, .shx), .kml, and .gpx. Let’s take the Chicago Transit Authority train lines shapefile straight from the City of Chicago’s open data portal. It downloads as a zipped collection of a shapefile and its buddies and we can take this file straight to the Converter and upload it. The Converter will unzip it and read the data; it will even identify the projection system (for Chicago-based geographic data, its common to use NAD83 Illinois StatePlane East FIPS 1201 Feet (SRID 102671, the same as SRID 3435).

The Converter will convert to one of the following formats, with same or new projection; accepts SQL statements to extract a subset of data:

  • ESRI shapefile
  • GML
  • KML, KMZ
  • GeoJSON
  • Microstation DGN
  • MapInfo File
  • GPX
  • CSV

How to split a bike lane in two and copy features with QGIS

A screenshot of the splash image seen on users with iPad retina displays in landscape mode. 

To make the Chicago Offline Bike Map, I need bikeways data. I got this from the City of Chicago’s data portal, in GIS shapefile format. It has a good attribute table listing the name of the street the bikeway is on and the bikeway’s class (see below). After several bike lanes had been installed, I asked the City’s data portal operators for an updated shapefile. I got it a month later and found that it wasn’t up-to-date. I probably could have received a shapefile with the current bikeway installations marked, but I didn’t have time to wait: every day delayed was one more day I couldn’t promote my app; I make 70 cents per sale.

Since the bikeway lines were already there, I could simply reclassify the sections that had been changed to an upgraded form of bikeway (for example, Wabash Avenue went from a door zone-style bike lane to a buffered bike lane in 2011). I tried to do this but ran into trouble when the line segment was longer than the bikeway segment that needed to be reclassified (for example, Elston Avenue has varying classifications from Milwaukee Avenue to North Avenue that didn’t match the line segments for that street). I had to divide the bikeway into shorter segments and reclassify them individually.

Enter the Split Features tool. QGIS is short on documentation and I had trouble using this feature. I eventually found the trick after a search that took more time than I expected. Here’s how to cut a line:

  1. Select the line using one of the selection tools. I prefer the default one, Select Features, where you have to click on the feature one-by-one. (It’s not required that you select the line, but doing so will ensure you only cut the selected line. If you don’t select the line, you can cut many lines in one go.)
  2. Toggle editing on the layer that contains the line you want to cut.
  3. Click Edit>Split Features to activate that tool, or find its icon in one the toolbars (which may or may not be shown).
  4. Click once near where you want to split the line.
  5. Move the cursor across the line you want to split, in the desired split location.
  6. When the red line indicating your split is where you desire, press the right-click mouse button.

Your line segment has now been split. A new entry has been added to the attribute table. There are now two entries with duplicate attributes representing that together make up the original line segment, before you split it.

This screenshot shows a red line across a road. The red line indicates where the road will be split. Press the right-click mouse button to tell QGIS to “split now”.

After splitting, open the attribute table to see that you now have two features with identical attributes. 

Copying features in QGIS

A second issue I had when creating new bikeways data was when a bikeway didn’t exist and I couldn’t reclassify it. This was the case on Franklin Boulevard: no bikeway had ever been installed there. I solved this problem by copying the relevant street segments from the Transportation (roads) shapefile and pasted them into the bikeways shapefile. New entries were created in the attribute table but with blank attributes. It was simple to fill in the street name, class, and extents.

Chicago bikeways GIS description

Bikeway classes (TYPE in the dataset) in the City of Chicago data portal are:

  1. Existing bike lane
  2. Existing marked shared lane
  3. Proposed on-street bikeway
  4. Recommended bike route
  5. Existing trail
  6. Proposed off-street trail
  7. Access path (to existing trail)
  8. Existing cycle track (also known as protected bike lane)
  9. Existing buffered bike lane

It remains to be seen if the City will identify the “enhanced marked shared lane” on Wells Street between Wacker Drive and Van Buren street differently than “existing marked shared lane” in the data.

Dumke fighting the open data fight for Chicagoans

Dan O’Neil mails a FOIA request to Chicago’s 311 service in 2007. Now, you can email most places (or fax!). 

I like to say that for every dataset a government agency proactively publishes, there’s one fewer FOIA* request it has to respond to.

City officials say they get so many FOIA requests that responding to them all has become a serious resource drain. But this is one of the reasons why—we don’t have any other way to get information about our government.

As a result, I will be adding to their workload and submitting another FOIA request. I don’t mind saying this publicly since it won’t be a secret anyway. That’s because the Emanuel administration has resumed Daley’s old habit of posting FOIA requests online. It’s also kept up Daley’s habit of not posting any information showing how responsive the city is.

That’s Chicago Reader author Mick Dumke talking about his troubles obtaining some data from the Chicago Department of Human Resources. Read the entire article, where he also gives a pretty good description of the “Chicago FOIA way”, the process for getting information in Mayor Emanuel’s transparent administration.

Note: I submit a FOIA request to some agency at least once a month. My most frequent FOIA requests go to the Chicago Transit Authority (CTA) and the Chicago Department of Transportation (CDOT). I also query the Chicago Police Department, and the Department of Administrative Hearings. Derek Eder has a story on how he and his colleagues worked with some Chicago staff to add new data about lobbying to the Chicago Data Portal.

*Freedom of Information Act. In California, it’s called FOIL, or Freedom of Information Law.

Rambling about automobile crash data and cellphone distraction

How often do bicyclists get involved with crashes because of cellphone distraction? See the table below. And how many crashes are caused by the bicyclist being distracted by a cellphone? We won’t and don’t know. 

The Chicago City Council will vote tomorrow on ordinance 02011-7146 to add a new section in Chapter 9 of the Municipal Code of Chicago: “9-52-110 Use of communication devices while operating a bicycle.”

In a Chicago Sun-Times article today, Matthew Tobias, the Chicago Police Department’s deputy chief of Area 3 patrol, reported on the number of citations that the department has issued to drivers in violation of the cellphone ban: “from 2,577 administrative violations in 2008 to 10,920 in 2009 and 19,701 last year” (known as “citations issued” in the table below).

I looked at the crash data to see how many crashes were coded as having been caused by “Distraction – operating an electronic communication device (cell phone, texting, etc)”.

Out of 274,488 recorded crashes in 2008, 2009, and 2010, there were 331 crashes which had a Cause 1 or Cause 2 of “Distraction – operating an electronic communication device (cell phone, texting, etc)”. The table below compares the rates of crashes to the rates of citations issued and the number of crashes that the police noted were caused by cellphone distraction. It also shows the number of these “cellphone distraction” crashes that involved bicyclists and pedestrians.

Year Citations issued Automobile crashes Cellphone distraction crashes % of cellphone distraction crashes Involved with bicyclists? Involved with pedestrians? National VMT (billions)*
2008 2577 111,701 91 0.081 3 10 2973.47
2009 10920 81,982 130 0.159 1 7 2979.39
2010 19701 80,805 110 0.136 6 8 2999.97

Maybe this data shows that the increased enforcement is causing fewer crashes?
However data for cyclists’ involvement in crashes and their cellphone use WON’T BE recorded unless there’s a rule change as the cause is only recorded for the vehicle involved in the crash, and bicycles are devices, not vehicles.

None involved fatalities.

*Yep, that’s 2 thousand billion. Read it like this, 2 trillion 973 billion and 470 million. VMT data from Bureau of Transportation Statistics.

TransportationCamp: Real-Time Pedestrian and Bike Location, Session Two

Real-Time Pedestrian and Bike Location How can we get it? What can we do with it? How can it not be creepy?
By Eric Fischer.

My summary of the discussion

There are many existing data sources that are published or have APIs that could stand as reasonable proxies for tracking people who are walking, biking, or just ambling around the city – some of this information is given away (via Foursquare) by those who are traveling, and other information is collected in real time (buses and taxis) and after the trip (travel surveys and Flickr photos). I don’t think the group agreed on any good use for this data (knowing where people are in the city right now), nor did the group come up with ways to ensure this collection is not “creepy.”

Eric’s original question involved the location of people bicycling, but the discussion spent more time talking about pedestrians. However, some techniques in tracking and data gathering could be applied to both modes.

See final paragraph for links on “further reading” that I find relevant to this discussion.

Schedule board at TransportationCamp West on Saturday in San Francisco at Public Works SF, 161 Erie Street.

[Ideas and statements are credited where I could keep track of who said what, and if I could see your name badge.]

Eric, starting us off:
We have a lot of information about where motor vehicles (MV) are in cities.
A lot of experience of city is not about being in a MV, though.

How many bikers going through intersection that are NOT getting hurt.
Finding places where people walk and where people’ don’t.

Where do people go on foot and on bikes?
As far as I know this isn’t available

Foursquare has benefits (awards) so people are willing to give the data, but we don’t want another Please Rob Me.

In SF, there are flash mobs, sudden protests, Critical Mass

Data sources:
-buses – boarding and deboarding – you can get a flow map from this. Someone said that Seattle has this data open.
-CTPP (Census Transportation Planning Package)
-city ped count
-Eric: Where people get on/off taxis.

“CycleTracks” – sampling bias, people with iPhones
-70% of handheld devices are feature phones, not smart phones. So there’s another sampling bias.

Opt-in factor
How do you sample?

SF Planning Dept. had a little program or project ask people to plot on a map your three most common walking routes.
What is your favorite street, and where do you not like to walk?

Eric: My collection tool is Flickr. Geotags and timestamps.
flickr.com/walkingsf

Magdalena Palugh: Are there incentives for commuting by bike? There are incentives for people who vanpool.
If there is incentive, I would gladly give up my data.
Michael Schwartz (SFCTA, sp?) What is difference <> SFCTA/MTA?

-If part of this is to get at where the trouble spots are, could you have people contribute where the good/bad parts are? “This overpass really sucks.”

Tom: Can you get peds from aerial images?
-Yes, but there’re too many limitations, like shade, and tree cover. Also, aerial images may be taken at wrong time (for a while the image of Market/Castro was during festival).

Brandon Martin-Anderson: What strategies have you tried so far?
-aerial images
-Flickr/Picasa location
-Street View face blur (a lot false positives)
Anything you plot looks kind of the same.

People like to walk where other people are. For safety reasons. -Good point on real-time basis.
Eric: Not a lobbying group for peds.
Eric: Find interesting places to go.
Richard: We need exposure data.

Paris bike sharing report showed that “Cycling is faster on Wednesdays.”
Europeans more open to sharing their private details – possibly because of stricter regulation on what agencies can do with the collected data. (There was a little disagreement on this, I personally heard the opposite).

Andrew: Can we use something like Xbox Kinect to track these people?

National Bike/Ped Documentation Project – same format
Seattle – 4 different groups that do annual bike counts. UW bike planning studio.

Who pays for this?
-Transportation planners pay for this.
-Private development projects (from contractor).
-Universities, NSF, Google
-Community groups –

Further reading

People

Mike Fleisher – DS Solutions
Andrew – @ondrae – urbanmapping.com

Notes to self

Is Census question about commuting about time or distance of “most traveled” mode?
Splunk – data analysis tool
What is difference <> SFCTA/SFMTA?