Translating from GTFS

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2017.x

Introduction

What is GTFS? GTFS stands for General Transit Feed Specification. It is a common format for transit agencies to use when distributing transportation schedules along with their accompanying geographic information (e.g. stops, routes, time between stops). It was created to address a need to incorporate transit information within Google Maps.Today, GTFS is used by many different web maps and mobile apps around the world. A great hub for finding and looking at GTFS data is the http://transitfeeds.com/, where you can search for data by city, or by city and transit agency. One of the more common translations that users would like to see supported by FME is GTFS to SHP, which we will cover in this tutorial.

 

GTFS Data Structure

The building blocks of a GTFS dataset are relatively simple. They are a collection of CSV files saved as text files, zipped into a single folder. At minimum, a GTFS dataset must include a number of .csv files. From Google’s GTFS reference, the minimums include:

  • agency.txt : One or more transit agencies that provide the data in this feed.
  • stops.txt : Individual locations where vehicles pick up or drop off passengers.
  • routes.txt : Transit routes. A route is a group of trips that are displayed to riders as a single service.
  • trips.txt : Trips for each route. A trip is a sequence of two or more stops that occurs at specific time.
  • stop_times.txt : Times that a vehicle arrives at and departs from individual stops for each trip
  • calendar.txt : Dates for service IDs using a weekly schedule. Specify when service starts and ends, as well as days of the week where service is available.

There are additional text files that may be included in a GTFS dataset in addition to the required files above. For a list and detailed description of these additional files, please navigate to the Google Transit API page regarding GTFS.

To understand the relationships between each table of data, let’s take a look at the chart of table relationships from Google’s Transit API page:

1-relationships.png

 

Relationships between tables from Google’s GTFS Reference

Viewing the above chart reveals unsurprising relationships between tables - for example, the ‘trip’ table is dependant on routes, calendar, stoptime and frequency.

For this tutorial, we will be using GTFS data provided by Translink, a transit agency based in Vancouver, BC.

 

Source Data

What does this data look like in FME? The first thing to note is that this data is in LL-84. For example, if we look at the ‘stops.csv’ file in notepad, we will see coordinate numbers in our stop_lat and stop_lon columns:

2-coordinates-notepad.png

 

Step-by-step Instructions

Inspecting GTFS

1. Add General Transit Feed Specification (GTFS) Reader

We will need to specify the coordinate system when we read in the data, as the csv files do not have a coordinate system stored. You can set the coordinate system of the data either when you add the GTFS Reader (either click Readers? Add Reader, or type Ctrl+Alt+R ):

3-addreader.png

 

Or, it can be set in the navigator pane after the reader has been added to the workspace :

4-navigator.png

 

2. Inspect the data in Data Inspector

Now that we have set the coordinate system of the data, let’s try viewing it in Data Inspector to take a look at the bus stops (stops.txt), by adding an Inspector to the reader, and running the workspace:

5-inspect.png

 

 

The output looks like this :

 

Creating Points for a Shapefile of Stops

That’s a lot of stops! Note that we did not have to use a VertexCreator to display the coordinates from the stops.txt file as points. The GTFS reader automatically interprets the stops_lat and stops_lon columns as x and y coordinates. This makes it very easy to convert these features into a point shapefile. Only 3 steps are required to accomplish this!

1. Add the GTFS reader and select the .zip folder (or selected text file) containing your GTFS data

2. Select the feature types you would like to read (in this case ‘Stops’)

3. Add a Shapefile writer, specifying the output location and filename. Please select ‘Automatic’ for the Shapefile definition:

7-addwriter.png

 

4. Once the writer has been added, in the writer parameters, specify the format attribute fme_feature_type as the Shapefile Name. This will ensure your output shapefile name matches the input GTFS feature type:

8-writerfeattypeproperties.png

 

5. Some attribute names on the Shapefile feature type have been changed however FME automatically maps them. For more stable linkages of attributes names, right-click on the feature flow and select Replace Link with AttributeManager.

6. Now run the workspace, and take a look at how many features are being passed through:

9-workspace.png

 

That’s it! Now we have a shapefile containing all of our transit stops serviced by Translink.

 

Creating Line Features for Transit Routes

How do we write out the route lines to a shapefile?

The answer to this relies on understanding how the GTFS tables are related. The routes.txt file contains information about each route, but does not contain any geographic information. The file shapes.txt contains the draw order and rules for creating line shapes for each route, but if you inspect the table for shapes.txt, there is only one column : shape_id.

How do we relate the lines created by shapes.txt to the routes.txt table? To join this data together we need trips.txt. Remember that trips.txt is a vital link to the rest of the GTFS tables - it has a relationship to nearly each mandatory table in the GTFS dataset. A trip belongs to a specific route (route_id) and the length/shape of a trip is defined by the link in the shape_id field in the trips.txt file to the shapes.txt file. This relationship is highlighted in red below:

10-relationshipshighlighted.png

Let’s try creating line features for each route in our dataset.

1. First, we add a GTFS reader and select 3 feature types: routes, shapes and trips

11-selectfeaturetypes.png

 

2. We know that shapes and trips are linked through shape_id, as we saw in the relationship diagram earlier. What we want is to join the geometry information from shapes.txt to the attribute information from trips.txt. To accomplish this we can use a FeatureMerger. In this case, let’s use ‘trips’ as the Supplier and ‘shapes’ as the Requestor (shapes is ‘requesting’ attributes from trips):

12-featuremergerone.png

 

In the parameters of the FeatureMerger we can specify the attributes to join on (shape_id), and whether we want to join Attributes only, or Attributes and Geometry. We want both attributes and geometry. Note that ‘Process Duplicate Suppliers’ is also set to ‘Yes’.

13-featuremergerparameters.png

 

In 2017+, there is a workspace parameter that defines what action to take when FME encounters a rejected feature:

14-navigatorrejectedfeature.png

 

If ‘Process Duplicate Suppliers’ is set to ‘No’, duplicate suppliers will be treated as rejected features, and therefore the translation will terminate if ‘Rejected Feature Handling’ is set to ‘Terminate Translation’. We can either change this workspace parameter to ‘Continue Translation’, or we can set ‘Process Duplicate Suppliers’ to ‘Yes’.

We are expecting many duplicate suppliers, as there are many trips per shape. Since we only want the shapes and 1 occurrence of the route information per shape, we will choose the second option, of setting ‘Process Duplicate Suppliers’ to ‘Yes’.

Let’s inspect the output from the FeatureMerger by adding an inspector transformer to the Merged port (alternatively, run the workspace with Full Inspection enabled and double click on the dark grey bubble with the feature count coming from the ‘Merged’ port):

15-datainspector.png

In the image above, I have highlighted the #2 route heading towards Burrard Station. The points seen on the map are vertices for the line segments, not our stops. What this shows us is that lines have been successfully generated with trip segment information, but we are still missing some route information that is only contained in our routes.txt table. Therefore we need to perform a second join using another FeatureMerger to merge routes.txt to the output of our first FeatureMerger.

 

3. This time, we will only be merging attributes, and the link will be based on route_id:

16-featuremergertwo.png

 

Note that in this image I have made use of the new Parameter Editor window.

4. Add a writer, in this case Esri Shapefile, selecting an output dataset location and setting Shapefile Definition to Automatic. When the Feature Type dialogue appears set Shapefile Name to the attribute fme_feature_type, then connect it to the FeatureMerged_2 Merged port. Some attribute names on the Shapefile feature type have been changed however FME automatically maps them. For more stable linkages of attributes names, right-click on the feature flow and select Replace Link with AttributeManager.

Let’s run this translation and inspect the output to see if we have achieved the desired result:

17-lineoutputdatainspector.png

Excellent! We now have our route and trip information attached to the shapes representing the routes. Note that in some cases, you will see some duplicate route entries. This is due to routes having more than 1 service_id value, which is associated with an entry in calendar.txt. For example, a given route may have a service_id value associated with weekend service, and another id for weekday service, and yet another for holidays.

GTFS data is surprisingly complex when you begin to weave each of the tables together to generate new information. This article touched upon creating simple shapefiles based on the stored geometry in the GTFS data, but there are many other ways you can join other pieces of data to these shapes to perform more advanced analysis.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.