Introduction
GTFS stands for General Transit Feed Specification. It is a common format for transit agencies to use when distributing transportation schedules along with their accompanying geographic information (e.g., stops, routes, time between stops). It was created to address a need to incorporate transit information within Google Maps. Today, GTFS is utilized by numerous web maps and mobile applications worldwide. A great hub for finding and looking at GTFS data is the Mobility Database, where you can search for data by city or by city and transit agency. One of the more common translations that users would like to see supported by FME is GTFS to SHP, which we will cover in this tutorial.
GTFS Data Structure
The building blocks of a GTFS dataset are relatively simple. They are a collection of CSV files saved as text files, zipped into a single folder. At a minimum, a GTFS dataset must include several .csv files. From Google’s GTFS reference, the minimums include:
- agency.txt : One or more transit agencies that provide the data in this feed.
- stops.txt : Individual locations where vehicles pick up or drop off passengers.
- routes.txt : Transit routes. A route is a group of trips that are displayed to riders as a single service.
- trips.txt : Trips for each route. A trip is a sequence of two or more stops that occurs at a specific time.
- stop_times.txt : Times that a vehicle arrives at and departs from individual stops for each trip
- calendar.txt : Dates for service IDs using a weekly schedule. Specify when the service starts and ends, as well as the days of the week on which the service is available.
There are additional text files that may be included in a GTFS dataset in addition to the required files above. For a list and detailed description of these additional files, please navigate to the Google Transit API page regarding GTFS.
To understand the relationships between each table of data, let’s take a look at the chart of table relationships from Google’s Transit API page:
Viewing the above chart reveals unsurprising relationships between tables - for example, the ‘trip’ table is dependent on routes, calendar, stop time, and frequency.
For this tutorial, we will use GTFS data provided by TransLink, a transit agency based in Vancouver, BC.
Source Data
What does this data look like in FME? The first thing to note is that this data is in LL-84. For example, if we look at the ‘stops.csv’ file in Notepad, we will see coordinate numbers in our stop_lat and stop_lon columns:
Step-by-step Instructions
Inspecting GTFS
1. Add General Transit Feed Specification (GTFS) Reader
We will need to specify the coordinate system when reading in the data, as the CSV files do not contain a coordinate system. You can set the coordinate system of the data either when you add the GTFS Reader (either click Readers > Add Reader, or type Ctrl+Alt+R ):
As of FME 2025.2, the Coordinate System parameter is now configured within the Parameters dialog of each reader/writer format. For more information, including details about the change and affected transformers, please see Coordinate System Parameter Location Change.
Or, it can be set in the navigator pane after the reader has been added to the workspace :
2. Inspect the data in Data Inspector
Now that we have set the coordinate system of the data, let’s try viewing it in Data Inspector to take a look at the bus stops (stops.txt), by adding an Inspector to the reader, and running the workspace:
The output looks like this :
Creating Points for a Shapefile of Stops
That’s a lot of stops! Note that we did not have to use a VertexCreator to display the coordinates from the stops.txt file as points. The GTFS reader automatically interprets the stops_lat and stops_lon columns as x and y coordinates. This makes it very easy to convert these features into a point shapefile. Only 3 steps are required to accomplish this!
1. Add the GTFS reader and select the .zip folder (or selected text file) containing your GTFS data
2. Select the feature types you would like to read (in this case, ‘Stops’)
3. Add a Shapefile writer, specifying the output location and filename. Please select ‘Automatic’ for the Shapefile definition:
4. Once the writer has been added, in the writer parameters, specify the format attribute fme_feature_type as the Shapefile Name. This will ensure your output shapefile name matches the input GTFS feature type:
5. Some attribute names on the Shapefile feature type have been changed; however, FME automatically maps them. For more stable linkages of attribute names, right-click on the feature flow and select Replace Link with AttributeManager.
6. Now run the workspace, and take a look at how many features are being passed through:
That’s it! We now have a shapefile containing all our transit stops serviced by TransLink.
Creating Line Features for Transit Routes
How do we write out the route lines to a shapefile?
The answer to this relies on understanding how the GTFS tables are related. The routes.txt file contains information about each route but does not include any geographic data. The file shapes.txt contains the draw order and rules for creating line shapes for each route, but if you inspect the table for shapes.txt, there is only one column: shape_id.
How do we relate the lines created by shapes.txt to the routes.txt table? To join this data together, we need the trips.txt file. Remember that trips.txt is a vital link to the rest of the GTFS tables, as it has a relationship with nearly every mandatory table in the GTFS dataset. A trip belongs to a specific route (route_id), and the length/shape of a trip is defined by the link in the shape_id field in the trips.txt file to the shapes.txt file. This relationship is highlighted in red below:
Let’s try creating line features for each route in our dataset.
1. First, we add a GTFS reader and select 3 feature types: routes, shapes, and trips
2. We know that shapes and trips are linked through shape_id, as we saw in the relationship diagram earlier. What we want to do is join the geometry information from shapes.txt with the attribute information from trips.txt. To accomplish this, we can use a FeatureMerger. In this case, let’s use ‘trips’ as the Supplier and ‘shapes’ as the Requestor (shapes is ‘requesting’ attributes from trips):
In the parameters of the FeatureMerger, we can specify the attributes to join on (shape_id), and whether we want to join Attributes only, or Attributes and Geometry. We want both attributes and geometry. Note that ‘Process Duplicate Suppliers’ is also set to ‘Yes’.
Note there is a workspace parameter that defines what action to take when FME encounters a rejected feature:
If ‘Process Duplicate Suppliers’ is set to ‘No’, duplicate suppliers will be treated as rejected features, and therefore, the translation will terminate if ‘Rejected Feature Handling’ is set to ‘Terminate Translation’. We can either change this workspace parameter to ‘Continue Translation’, or we can set ‘Process Duplicate Suppliers’ to ‘Yes’.
We expect to encounter many duplicate suppliers, as there are multiple trips per shape. Since we only want the shapes and 1 occurrence of the route information per shape, we will choose the second option, of setting ‘Process Duplicate Suppliers’ to ‘Yes’.
Let’s inspect the output from the FeatureMerger by adding an inspector transformer to the Merged port (alternatively, run the workspace with Full Inspection enabled and double click on the dark grey bubble with the feature count coming from the ‘Merged’ port):
In the image above, I have highlighted the #2 route heading towards Burrard Station. The points seen on the map are vertices for the line segments, not our stops. This indicates that lines have been successfully generated with trip segment information; however, we are still missing some route information that is only contained in our routes.txt table. Therefore, we need to perform a second join using another FeatureMerger to merge routes.txt to the output of our first FeatureMerger.
3. This time, we will only be merging attributes, and the link will be based on route_id:
4. Add a writer, in this case Esri Shapefile, selecting an output dataset location and setting Shapefile Definition to Automatic. When the Feature Type dialogue appears, set Shapefile Name to the attribute fme_feature_type, then connect it to the FeatureMerged_2 Merged port. Some attribute names on the Shapefile feature type have been changed; however, FME automatically maps them. For more stable linkages of attribute names, right-click on the feature flow and select Replace Link with AttributeManager.
Let’s run this translation and inspect the output to see if we have achieved the desired result:
Excellent! We now have our route and trip information attached to the shapes representing the routes. Note that in some cases, you will see some duplicate route entries. This is due to routes having more than 1 service_id value, which is associated with an entry in calendar.txt. For example, a given route may have a service_id value associated with weekend service, and another id for weekday service, and yet another for holidays.
GTFS data is surprisingly complex when you begin to weave each of the tables together to generate new information. This article discussed creating simple shapefiles based on the stored geometry in the GTFS data, but there are many other ways to join other pieces of data to these shapes to perform more advanced analysis.
Data Attribution
The data used here originates from data made available by TransLink.