Automating PDF Data Extraction for ArcGIS Online

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2023.0

Introduction

In a previous tutorial, we explored the capability of FME transformers in mapping point data in the absence of longitude and latitude. We introduced essential transformers such as the CoordinateExtractor, Orientor, HorizontalAngleCalculator, and Offsetter, which allowed us to manipulate and offset intersection points based on specific criteria. By the end of the tutorial, we successfully extracted important attributes from a PDF file, joined the data to the newly generated Average Daily Traffic (ADT) points, and stored it in an Esri file geodatabase. Building upon the previous tutorial, we will now create another workspace that focuses on aggregating new incoming point data from another PDF, updating their attributes, and writing the final output to ArcGIS Online. 

 

Scenario

For the majority of ADT collection points, two separate Traffic Speed Reports are generated to gather data for each opposite travel direction. The sample report we used in the previous tutorial only contains data for one direction (ADTOne). Therefore, we need to extract data for the opposite direction (ADTTwo) from a different report. However, if we run the workspace we previously created using this additional report as input, there will be two overlapping collection points in the output file geodatabase. Thus, our main objective for this new workspace is to aggregate any overlapping points created by the additional reports and append ADT data for the opposite direction to the same collection points we created earlier. Finally, we will output the new and updated file geodatabase to a feature service on ArcGIS Online using the AGOL Feature Service Writer. 
 

Step-by-step Instructions  

Part 1: Create a New Workspace to Aggregate Overlapping ADT Points and Publish to ArcGIS Online

1. Open the PDFreader_ADTmapping workspace and change the AdobeGeospatialPDF Reader input
At the end of the previous tutorial, you may not see any overlapping ADT points as we have not read in the Traffic Speed Report for the same location with opposite travel directions. To test this, open the PDFreader_ADTmapping workspace and click on the AdobeGeospatialPDF Reader. 

When the reader is highlighted, click on the Edit Parameters button then change the Source PDF file to 1ST ST N OF BASSETT ST SB.PDF. This report is for the same collection point, but the travel direction is southbound instead of northbound. Make sure that the Non-Spatial box is checked then Click OK and Run the workspace. 


image18.png

Confirm that there is one output feature written to the output geodatabase. Click the View Written Data button on the Esri Geodatabase Writer and check the output table. There is one new feature added, now select this feature and point your mouse toward the Graphics window, then right-click and choose Zoom to Selected Feature.

image17.png

This new ADT point has the same location as the previous one we processed. However, the attribute values are slightly different. The TravelDirection is now Southbound, and The ADT count is correspondingly written to the ADTTwo field instead of ADTOne. 

This highlights the need to create a separate workspace to aggregate the overlapping points. 

2. Create a New FME Workspace and Add an Esri Geodatabase (File Geodb Open API) 
Close the workspace without saving it, or save it with a new name. Click on File and select New (or press Ctrl + N) to create a new FME workspace, click the Add Reader button on the toolbar then type in Esri Geodatabase in the Format box, and select the Esri Geodatabase (File Geodb Open API) reader

For the input dataset, click on the ellipsis button and browse to the OutputGDB folder inside the training directory, then select the AverageDailyTraffic.gdb. Next, click on the Parameters button to open up the Parameters window, then click the ellipsis button next to the Tables text box and select ADT_points_mapped. Click OK then Run the workspace. Confirm that 3 features are read in. 

image10.png
3. Add an Aggregator to Remove Overlapping ADT Points
Add an Aggregator to the canvas, then connect it with the Esri Geodatabase reader feature type. When prompted, set up the parameters as follows: 

  • Check the Group Processing Box 
  • Group by: click the ellipsis button, select CollectionPoint and IntersectionID
image5.png
  • Expand the Attribute Accumulation dropdown 
    • Accumulation Mode: Merge Incoming Attributes
    • Attribute to Sum: click the ellipsis button and select ADTOne, ADTTwo
image28.png


Use the following screenshot to double-check your parameters, before clicking OK and Run the workspace. 

image1.png

Now if you preview the Output Table, the two overlapping features were aggregated into one and both ADTOne and ADTTwo are populated with ADT data for opposite directions. As a reminder from the previous tutorial, ADTOne indicates traffic count for Northbound (or Eastbound), while ADTTwo represents Southbound (or Westbound) travel direction. 

image12.png

Next, we will use an AttributeManager to add a new field called "ADT" to represent the total traffic counts at each location. Additionally, we will reorganize the attribute order one last time before publishing.

4. Add an AttributeManager for Final Data Clean Up
Add an AttributeManager to the canvas and connect it with the Aggregator. Click the “+” button on the lower left corner to add a new attribute, and type in ADT in the Output Attribute box. For its Value, click the drop-down arrow and open the Arithmetic Editor then select ADTOne and ADTTwo from the FME Feature Attributes drop down. Lastly, add a plus character “+” in between the two values. Use the following screenshot to double-check your statement. 

image19.png

Click OK to close the Arithmetic Editor, then use the up and down arrows on the lower left corner to reorganize the attribute order. They should be as follows:

  • Date
  • ADTOne
  • ADTTwo
  • ADT
  • TravelDirection
  • CollectionPoint
  • NearestIntersection
  • Intersection ID
  • StreetOne
  • StreetTwo
  • OBJECTID

Use the following screenshot to double-check your Parameters before Clicking OK and Run the workspace. 

image23.png

It is a good time to save, so Click Save the workspace and name it ADT_updating.fmw

Now we are ready to write out the final features to a new feature class inside the AverageDailyTraffic Geodatabase, and publish them to ArcGIS Online. To do so, we will need two separate Writers. First, let’s create a new feature class inside the AverageDailyTraffic.gdb we created from the previous tutorial. 

5. Add an Esri Geodatabase (File Geodb Open API) to Write Output to a New Feature Class
Click Add Writer then choose the format as Esri Geodatabase (File Geodb Open API). For the Dataset, click on the ellipsis button and browse to the OutputGDB directory inside our training folder, select the AverageDailyTraffic.gdb 
C:\SJDOT_ADT_Mapping\OutputGDB\AverageDailyTraffic.gdb
Ensure that the Feature Class or Table Definition is set to Automatic. Use the following to fill in Feature Type parameters when prompted:

  • Feature Class or Table Name: ADT_updated
  • Geometry: geodb_multipoint (make sure it’s set to multipoint not geodb_point) 
  • Feature Operation: Insert
  • Table Handling: Create if Needed
  • Match Columns: click on the ellipsis button and select CollectionPoint, IntersectionID
image2.png


Click OK then Run the workspace. If you open the AverageDailyTraffic.gdb using ArcGIS Pro, you should find a new feature class named ADT_updated that contains non-overlapping ADT points. Now let’s proceed to publish the same output to ArcGIS Online. 

6. Add an ArcGIS Online (AGOL) Feature Service Writer 
Select the canvas again and type AGOL then select the ArcGIS Online (AGOL) Feature Service writer.

Next, select Parameters on the left-hand side. First, select an ArcGIS Online connection. If you do need to create a new ArcGISOnline connection, you can create a new connection by selecting Add Web Connection. Next, create a name for your connection and select Authenticate. Enter your ArcGISOnline username and password and select Sign In. See Tutorial: Getting Started with ArcGIS Online and Portal to learn how to connect to ArcGIS Online. 

Then for Feature Service type SJDOT_ADT_Demo. Make sure the Feature Service Handling is set to Create If Needed before clicking OK to close the dialog.

image22.png

On the Add Writer dialogue, make sure the Layer Definition is set to Automatic… 

image11.png

Click OK again to close the Writer. When prompted, fill in the Feature Type parameters as follows: 

  • Layer Name: SJDOT_ADT_Updated
  • Geometry: arcgisonline_multipoint
  • Feature Operation: Insert
  • Feature Type Handling: Use Existing
image9.png


Click OK then Run To This. Now, log into ArcGIS Online to confirm that the data has been written out. 

image29.png

As always, make sure your workspace is saved and properly documented with annotations and bookmarks like the screenshot below. 

image3.png
Since the beginning of this tutorial series, we have manually read 3 Traffic Speed Reports (2 samples for cardinal segments, and 1 sample for the non-cardinal segments) and written out 2 features. However, you might be wondering what would happen if we had dozens of Traffic Speed reports every week. How can we automate this PDF Extraction and ADT mapping process?

This is where FME Flow Automation becomes incredibly useful. In the next crucial part of this tutorial, we will publish all the workspaces created to FME Flow and explore ways to set up an Automation that will streamline the entire process.
 

Part 2: Create a Resources or Network Directory Automation

The concept behind this automation is as follows: we will set up a Directory Watch in FME Flow (formerly FME Server), which will monitor a specific directory. Whenever we upload new Traffic Speed Reports to that directory, the automation will be triggered. It will start by running the PDF Reader and ADT Mapping workspace, followed by the ADT updating workspace. Lastly, the automation will send a notification email for each report that has been processed.

1. Create Resource Folder 
Login to the FME Flow (2022.0 or later) web interface.

The first step is to create a Resources folder to upload the data. Open the FME Flow web interface and navigate to the Resources page. Browse to the Data folder and create a new folder called SJDOT_ADT. Inside this folder, let’s create 3 new subfolders called Shapefiles, OutputGDB, and TrafficSpeedReports

  • For the Shapefiles folder, click Upload, select Files then browse the training folder (C:\SJDOT_ADT_Mapping\InputShapefiles) and select the Streets.zip and Street_Intersections.zip files to upload these shapefiles. Make sure to select the zip file, not the .shp files.  
  • For the OutputGDB folder, click Upload, select Folder then browse the training folder C:\SJDOT_ADT_Mapping\OutputGDB and select the AverageDailyTraffic.gdb folder. 
  • For the TrafficSpeedReports folder, let’s leave it blank for now. We will upload the traffic reports once the automation is set up. 

image8.png

2. Publish All Workspaces and Custom Transformers to FME Flow
Before creating the automation, it is necessary to publish all the workspaces created previously to FME Flow, along with any Custom Transformers used in these workspaces. Let’s begin by publishing the first workspace, and its associated custom transformers, including the Grouper, HorizontalAngleCalculator and the FeatureCounter.

Open the PDFReader_ADTMapping workspace in FME Workbench, on the Toolbar, click the Publish button. In the Publish to FME dialog, select your FME Flow connection, then click Next. However, in case you have not set up an FME Flow Connection, click the drop-down arrow under the FME Flow Connection and select Add Web Connection. See Tutorial: Getting Started with FME Flow to learn how to create an FME Flow Web Connection. 

image27.png

The connection information and credentials can be obtained from your FME Flow Administrator if you are unsure. 

  • Web Service: FME Flow 
  • Server URL:  http://<myservername>/fmeserver  
  • Connection Name: Training FME Flow
  • Authentication: Basic
  • Username: <your username>
  • Password: <your password>
image7.png


Once connected, create a new Repository named SJDOT_ADT, leave the workspace name as is then click Next. For the Register Services dialog, select Data Download and Job Submitter then click Publish. Check the Translation Log to ensure the workspace is successfully published. 

image15.png

Next, we will publish our custom transformers. To make sure we are using the latest versions, go to the FME Hug page at https://hub.safe.com/ and download all the custom transformers we used in this workspace. Once the page opens, type the custom transformer name into the search bar and click on it. 

image6.png

On the transformer info page, click Download on the top right and double-check that the HorizontalAngleCalculator.fmx file is in the Downloads directory. Repeat this step for the other two custom transformers, the Grouper and FeatureCounter. 

image14.png


Next, return to FME Flow and navigate to the Files & Connections > Resources page. Select the Engine folder, then select the Transformers subfolder. This is where all the custom transformers will be uploaded to, so they can be used by other workspaces on FME Flow. Now, click Upload > Files and browse to the Downloads directory where the fmx files are saved. Select the Grouper.fmx, HorizontalAngleCalculator.fmx, and FeatureCounter.fmx files then click Open. 

*Tip: You can also drag and drop the files into the web interface for a quick add! 

image24.png

Make sure all of the custom transformers are successfully uploaded to FME Flow, inside the Resources>Engine>Transformers directory.  

Now, let’s publish the second workspace, ADT_Aggregating.fmw to FME Flow. Publishing this workspace is a little faster as we don’t have any custom transformers. Back in FME Workbench, open ADT_Aggregating.fmw then click Publish on the toolbar. Select the same FME Flow Connection you previously chose. Click next, make sure the Repository Name is the same as above (SJDOT_ADT). Keep the default workspace name then click Next. For the Upload Connections dialog, make sure you select the same ArcGIS Online connection you set up earlier for the AGOL Feature Service Writer, then click Next. Register to the Data Download and Job Submitter services then click Publish. 

Reopen the FME Flow web interface and navigate to the Workspaces page. Click on the SJDOT_ADT folder and make sure the two workspaces are successfully uploaded. With all the resources being uploaded to FME Flow, we are now ready to create the automation. 

image13.png

3. Create Automation
Now, to create the automation that will watch the TrafficSpeedReports directory for incoming files. Navigate to Automations > Create Automation on the side menu bar. In the Get Started dialog that appears when you go to the Automations page for the first time, click on the Create tab, and click Create New to start a new automation.

image33.png

By default, automations start in guided mode. This means that there is already a Trigger node on the canvas but it will still need to be configured.

Start by double-clicking the Trigger and a parameter box will appear on the right-hand side of the canvas. Select Resource or Network Directory (updated) from the drop-down list as the Trigger for this automation.

image16.png

4. Define Trigger Parameters
After selecting a Trigger type a list of configurable parameters appears in the dialog. Click the ellipsis button to browse the FME Flow Resources and set the Directory to Watch parameter. Select the newly created TrafficSpeedReports folder under the Data subfolder:
image21.png

Leave the Watch Subdirectories and Watch Folders parameters set to No, since we are only interested in monitoring files in the BuildingUpdates folder directly.

Then for the Events to Watch for parameter remove the MODIFY and DELETE actions. In this case, we are only interested in monitoring for new files arriving, not old ones being changed or removed.

image34.png

Lastly, change the Poll Interval to 10 Seconds, and then in the bottom left corner, click on the Validate button to ensure the trigger was set up correctly. Now click Apply to save these parameters. In the canvas, the Trigger node will update to show it is a Resource or Network Directory (updated) Trigger.

image31.png

Save the automation by selecting Menu > Save As and name the Automation “SJDOT_ADT”.

5. Add an Action and Configure it as a Run Workspace
Click the Next Action node connected to the Success port of the Resource and Network Directory Trigger. Click on the Action icon to open up the Action Details Window. Set the action details and workspace parameters as follows:

  • Action: Run a Workspace
  • Repository: SJDOT_ADT
  • Workspace: PDFReader_ADTmapping.fmw 
  • Source Adobe Geospatial PDF File: click the drop-down arrow, expand the Directory drop-down and select File Path. This configuration will make our automation more dynamic, by using each of the Traffic Speed Report files uploaded to the Directory Watch’s File Path as an individual input for this workspace, and keeps running the workspace until there is no new PDF file left. 
image20.png
 
  • Source Esri Shapefile(s): click the ellipsis button, browse to the Shapefiles subfolder inside Resources (Resources>Data>SJDOT_ADT>Shapefiles), then select the Streets.zip file. 
  • Alternatively, you can copy and paste the following directory to the Source Esri Shapefile input box:  "$(FME_SHAREDRESOURCE_DATA)/SJDOT_ADT/Shapefiles/Street_Intersections.zip" image35.png

 

  • Source Esri Shapefile(s) (3rd row): similarly, click the ellipsis button, then browse to the same Shapefiles directory and select the Street_Intersections.zip, or copy and paste the following directory: "$(FME_SHAREDRESOURCE_DATA)/SJDOT_ADT/Shapefiles/Street_Intersections.zip" 
  • File Geodatabase: click the ellipsis button and browse to the OutputGDB folder, then select the AverageDailyTraffic.gdb subfolder and click OK. 
  • Alternatively, you can copy and paste the following directory: "$(FME_SHAREDRESOURCE_DATA)/SJDOT_ADT/OutputGDB/AverageDailyTraffic.gdb"


Use the following screenshot to double-check all the parameters, then click Apply. 
image36.png

6. Chain Another Run Workspace Action for the Second Workspace
Drag out a second Action node and configure it as a Run Workspace downstream of the success port of PDFReader_ADTMapping.fmw. We will run the ADT_Aggregating.fmw here. For the File Geodatabase Parameters, make sure you browse to the OutputGDB folder and select AverageDailyTraffic.gdb for both rows. Or simply copy and paste the following directory to both of the input boxes. 
"$(FME_SHAREDRESOURCE_DATA)/SJDOT_ADT/OutputGDB/AverageDailyTraffic.gdb"

Use the following screenshot to double-check all the parameters, then click Apply.  

image32.png

To explain, chaining the ADT_Aggregating workspace after the PDFReader_ADTMapping will ensure that any PDF uploaded is processed and a new feature class is written to the file geodatabase inside the Resources Directory. Subsequently, the second workspace, ADT_Aggregating will read in the feature class output from the former one, aggregate overlapping ADT points, then write new data to a separate feature class and simultaneously publish the same output to ArcGISOnline as a Feature Service. 

After the two workspaces are successfully run by the trigger and new data is written out to ArcGIS Online. We will configure a final action, which is sending a notification email to DOT staff. 

7. Add an External Action for Email Notification
Downstream of the success port of the ADT_Aggregating.fmw Action, configure a Send an Email External Action to notify the SJDOT staff when new data is being published. 

As in the article Run a Workspace in Response to Incoming Email, use Load Template or manually enter your email server information. If you are using an SMTP server that requires authentication (likely with popular email providers), you’ll need to enter values in the SMTP Account (optional) and Password (optional) fields. Input an Email To address you can check, and add an Email From (the same as your account email address) address.

For the Email Subject field, put in “New ADT file uploaded to the Watch Directory on FME Flow”

In the Email Body field, paste in the following message “The Average Daily Traffic AGOL feature service has been updated with new data.” 

Next, click Apply and make sure your automation looks like the screenshot below. 

image4.png

8. Save and Start the Automation
To preserve your progress, click Menu above the Automations canvas, then Save As. Choose a name (e.g. SJDOT_ADT) and add some descriptive tags (optional), then click OK. Finally, click Start Automation in the upper right.
image30.png

9. Test the Directory Watch Trigger 
Once starting the Automation, navigate to the TrafficSpeed Reports subfolder (Resources>Data>SJDOT_ADT>TrafficReports), click Uploads>Files, then select all the files in that sample folder and click Open to upload. Make sure 37 PDF files are successfully uploaded to our Watch Directory.

image25.png

Each of the files uploaded will trigger the automation to run subsequently. If you are familiar with coding, this is essentially a loop using the new reports as input. As we uploaded 37 files to this folder, our automation will run and write the output to ArcGIS Online 37 times. Newer versions of the output will replace older ones until the loop ends. As a result, there will be a feature service containing 19 ADT points published to ArcGIS Online, with all the important attributes including ADT counts, collection dates, collection points, travel directions,  and nearest intersections. 

Double-check the number of features, their locations and all associated attributes, it should look like the screenshot below. 

image26.png

Congratulations! You have successfully created an FME Flow Automation that is triggered every time a new Traffic Speed Report is uploaded to the Watch Directory. Average Daily Traffic counts will be automatically extracted from the PDFs and mapped to their real-world locations. This automation is expected to save hours of labor work and avoid any errors in manual data entry. Hopefully, this series of articles helps streamline the data collection process at your DOT and be an inspiration for other projects. 


Data Attribution

The data used is made available by the City of San Jose’s Department of Transportation ​​​​​​​. 
 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.