Data Integration for Transportation with FME

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2023.0


Geographic Information Systems (GIS) play a crucial role within state government Departments of Transportation (DOTs). However, data accessibility and integration between different systems can often pose significant challenges. In this series of articles, we present a range of use cases where DOT staff can harness the power of FME Flow and FME Form to optimize their data collection and publication workflows. By leveraging these tools, DOTs can streamline their processes, reduce the risk of errors associated with manual data entry, and enhance overall data management efficiency. The core technologies and use cases being introduced include: 


Why Use FME to Extract Text and Tabular Data from PDF?

FME’s Adobe Geospatial PDF Reader offers extensive capabilities for extracting information from PDF documents. It can extract various types of data, including imagery, rasters, vector data, text, spatial information, and attributes.

However, extracting information from PDF documents can be complex. PDFs are a document format that can contain a wide range of information spread across multiple pages, including text, tables, and maps. In this tutorial, we will build upon the concepts covered in the Getting Started with PDF Reading tutorial and explore advanced techniques for extracting text and tables from PDFs.

The focus of this tutorial will be on extracting Average Daily Traffic (ADT) data, as well as important details such as collection dates and locations. ADT data is widely used in transportation planning, roadway design and construction, and the overall management of a city's road network. The sample PDF files used in this tutorial are Traffic Speed Reports provided by the City of San Jose's Department of Transportation.

How Linear Referencing Applies to Mapping Point Data without Geographical Coordinates? 

Linear referencing is a technique employed in geographic information systems (GIS) and transportation engineering to locate and analyze features along a linear element, such as a road, pipeline, or river. It entails measuring positions and distances along the linear feature, typically using a linear reference system (LRS). In linear referencing, distances are measured from a known starting point, often referred to as a reference point or zero milepost. These distances are typically expressed in linear units, such as meters or miles. By associating attributes or events with specific positions or distances along the linear feature, linear referencing enables the analysis and management of information in relation to the linear feature.

Linear referencing finds applications in various domains, including mapping, route analysis, asset management, and transportation planning. For instance, it can help determine the location of road signs or utility poles along a road, measure the distance between two points on a pipeline, or analyze the distribution of accidents along a highway segment. Linear Referencing is an effective method to map transportation events without knowing their exact geographical coordinates (latitude and longitude).

Linear Referencing principles are applicable for the ADT data mapping process after the extraction, as the Traffic Speed Reports do not specify the exact location of traffic count collection points. Thus, the second tutorial focuses on mapping all the ADT points and their associated data even in the absence of longitude and latitude coordinates. The HorizontalAngleCalculator custom transformer, along with the CoordinateExtractor and Offsetter are essential tools to achieve this task. 


How to Automate PDF Data Extraction for ArcGIS Online (AGOL)?

Automations allow you to quickly and easily create workflows that run on a schedule or whenever an event happens. Automations are widely known for their user-friendliness, allowing users who are unfamiliar with scripting of web integration to specify tasks to be performed at any chosen time, while monitoring for system failures. 

Building on the Run a Workspace when New Data Arrives in a Directory tutorial, the final tutorial of this series starts with a subsequent workplace that aggregates any overlapping data from the first workspace and publishes the final result to ArcGIS Online using the Esri AGOL Feature Service Writer. Next, it provides instructions on creating an FME Flow Automation that monitors a directory for incoming PDFs, runs the PDF extraction and AGOL publication workspaces, and finally sends a notification email to DOT staff. The tutorial also introduces the techniques to publish workspaces and their associated custom transformers so that the automation can run properly. 



Extracting Text and Tabular Data from PDF
In this tutorial, you will learn how to use the Adobe Geospatial PDF Reader and other FME Transformers that help split and extract text and tabular data from the input PDF. 

How to Map Point Data without Latitude and Longitude: Leveraging Linear Referencing Principles
In this tutorial, you will learn how to map point data in the absence of geographical coordinates, leveraging linear referencing principles to perform such a task. 

Automating PDF Data Extraction for ArcGIS Online
In this tutorial, you will learn how to create a subsequent FME workspace that aggregates overlapping point data from additional PDF extraction, then publish the final output to ArcGIS Online. Next, you will also learn how to create a Resources or Network Directory Automation that monitor a directory for incoming PDFs, run the extraction and publication workspaces, then finally send out notification emails. 

Additional Resources

Getting Started with PDF Reading
Run a Workspace when New Data Arrives in a Directory
Getting Started with List Attributes


Data Attribution

The data used is made available by the City of San Jose’s Department of Transportation


Was this article helpful?



Please sign in to leave a comment.