Introduction to Scheduling and Change Detection Workflows in FME Server

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2022.0

Introduction

The FME Server Automations service allows the creation of triggers - a way to receive content from clients over a supported protocol.  Examples of triggers may be incoming email, directory watching, webhooks, or SQS. These actions contain the logic to trigger a new event.
If you have a client or dataset that isn’t supported as an action in FME, you can create a workspace and schedule it to run on FME Server at desired intervals. The workspace can then be used to trigger an action (this could be another workspace to do processing).

Workspace Requirements:

To build a workspace that can check the state of a system or dataset, check that FME Workbench supports the data format or communication method.
If you are checking a dataset for change or new features, does the dataset have any fields that contain information about when the data was changed, for example, a timestamp.

If the dataset has no way of identifying change, do you want to keep a copy of the data for comparison (caching) or just process the whole dataset?

Caching data:

With data caching, data size and storage may be a consideration. If you have a large dataset, reading two sets of data into a workspace to do change detection may increase workspace runtime and memory use. You will also need sufficient space to store a copy of the data.
Some workflows may need an up to date set of data (eg a database table of current data), in this case you will already have a dataset to compare against, and won’t need a duplicate for change detection. If your workflow is reading a dataset and publishing a trigger (for example, an email notification or SMS) based on new data or change, and not storing data, this may require data caching.

The ChangeDetector transformer can be used to compare two sets of data.

If the data is cached or stored in a format that supports SQL queries, this can save time by not requiring FME to read in the whole dataset, only features where two values differ.

 

Scenario

In this scenario, a CARTO dataset of air quality needs to be kept up to date. The current air quality values are retrieved from a JSON data feed, and written to CARTO for visualization on a map.

There are several methods of doing this:

 

ChangeDetector

aqi-changedetector.png

In this workspace, the existing CARTO data is being read in, as well as the JSON air quality data feed.
Both datasets are passed through the ChangeDetector, matching the data on the UUID. Any features that have been added will be sent to the CARTO writer to update the dataset.

 

FeatureReader

aqi-featurereader.png

This workspace is getting the same output as using the ChangeDetector by using a FeatureReader. The FeatureReader is connected to the CARTO dataset, but will only read in requested features. In this case this is specified by a WHERE clause:

@Value(uuid) = "uuid" AND @Value(aqi) != "aqi"


The FeatureReader is looking for features where the UUID matches, but the air quality index is different. When these features are read in there will be a conflict between air quality index values, so the FeatureReader is set to ‘Use Initiator’, taking the most current air quality index value and sending it to the CARTO writer to update the dataset.

 

SQLExecutor

aqi-sqlexecutor.png

One step further is to use the SQLExecutor to identify features in the CARTO dataset where the air quality index value differs from the JSON data feed, and doing the update through SQL.

 

UPDATE aqdata SET aqi='@Value(aqi)' WHERE uuid=@Value(uuid) AND @Value(aqi) != "aqi"

 

Comparing timestamps

If your data has a timestamp you can use this to compare if the data is newer or has been updated since the workspace last ran.
If you have regular schedules you can use the new DateTimeCalculator to determine if a timestamp is newer based on a time interval.

image.png

If there’s a chance that your job may be delayed in running resulting in inconsistent time intervals, you can record workspace last run time. This can be stored on FME Server, to be read in and updated every time the workspace runs.

workspacelastrun.png

In this example the workspace is split into two sections. The first section reads in the last run time of the workspace, and passes that timestamp to the other half of the workspace using a VariableSetter and VariableRetriever.

The input data is an ATOM feed of roadworks in Vancouver, containing an updated timestamp. The difference between the workspace runtime and the data update is calculated. Any interval values greater than 0 are new scheduled roadworks or updated existing ones, and these can be filtered for processing or notifying.

 

FME Server Schedules

Once a workspace has been published to FME Server it can be set to run on schedule or an automation with a schedule.

The Schedules tab will bring up a list of existing schedules, showing information about the schedule and whether it is enabled.

image.png


Creating a new schedule allows you to enter a name, category and description for the schedule. The recurrence can be set to run on an interval, once or based on a CRON expression for more complex scheduling.
 

7-1.jpg

 

After the correct workspace has been selected, more settings can be applied. To notify a topic on job success or failure, expand the Notifications view.
 

7-2.jpg

 

Some additional properties can be set on schedules, such as ‘Run Until Canceled’, ‘Queued Job Expiry Time’ and ‘Running Job Expiry Time’.
 

7-3.jpg

Additional Resources

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.