Joining to external datasets when running in stream mode

Introduction

In FME Form, if you run a workspace in stream mode, the workspace will run indefinitely until you stop the translation. If you need to join the incoming stream messages to an external data source (e.g. database) then you need to think about how you will do this.

Reading Static Data

If you wish to join the message stream to a dataset that does not update (e.g. state boundaries) then you can just use a Reader. The reader will read the data at the start of the translation, and then it will not update until the workspace is restarted. If running on FME Flow as a Stream, this could be months.

Reading Changing Datasets

If you wish to join the message stream to a dataset that will update periodically, then you will need to use the FeatureReader transformer instead of a Reader. With the FeatureReader, each time the initiation feature is triggered, the reader is recreated, resulting in the data being re-read. A typical scenario is to utilize the WindowChanged feature of the TimeWindower transformer to trigger a re-read of the data whenever the window changes.

By implementing this, you will always ensure that the datasets you join to the stream (or use for analysis) are up to date. The downside is that the entire dataset is re-read every time an initiator feature enters the transformer. This might be what you want, or it might be wasteful, reading data over the network again even though nothing has changed. Fortunately, the FeatureReader supports caching.

Intelligently Reading Changing Data

By enabling the cache on the FeatureReader transformer, you control how often your external data is refreshed. Below, I have set the cache to expire 0.1 of an hour (6 minutes) on a PostgreSQL reader. This means if you are running a streaming workflow, the workspace will cache the data for six minutes, after six minutes the first feature to hit the initiator port will trigger a full refresh of the data from the database.

Some considerations when using the cache in a streaming workflow:

If the dataset is file-based (e.g. ESRI File Geodatabase) the cache will expire when the table is updated or when the Cache Timeout value is reached. If the dataset is a relational database or API, then the cache will expire only based on the Cache Timeout setting defined.
Cache Timeout supports a float value so you can set the cache to expire after part of an hour, e.g., 0.16666 equates to 10 minutes.
If there are two FeatureReaders in the workspace, then they work independently, and you can set a different cache, and they don’t interfere with each other. This is powerful as it means you can join multiple datasets in one stream workflow and still have control over the update frequency.

Search

Joining to external datasets when running in stream mode

Introduction

Reading Static Data

Reading Changing Datasets

Intelligently Reading Changing Data

Was this article helpful?