Joining to external datasets when running in stream mode

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2021.2


In FME Form (formerly FME Desktop), if you run a workspace in stream mode, the workspace will run indefinitely until you stop the translation. If you need to join the incoming stream messages to an external data source (e.g. database) then you need to think about how you will do this. 


Reading Static Data

If you wish to join the message stream to a dataset that does not update (e.g. state boundaries) then you can just use a Reader. The reader will read the data at the start of the translation, and then it will not update until the workspace is restarted. If running on FME Flow (formerly FME Server) as a Stream, this could be months.


Reading Changing Datasets

If you wish to join the message stream to a dataset that will update periodically, then you will need to use the FeatureReader transformer instead of a Reader. With the FeatureReader, each time the initiation feature hits, the reader is recreated which results in the data being re-read. A typical scenario is to use the WindowChanged feature from the TimeWindower transformer to trigger a re-read of the data every time the window changes.

By implementing this, you will always ensure that the datasets you are joining to the stream (or using for analysis) are up to date. The downside is that the entire dataset is re-read every time an initiator feature enters the transformer. This might be what you want, or it might be wasteful, reading data over the network again even though nothing has changed. Fortunately, the FeatureReader supports caching.


Intelligently Reading Changing Data

By enabling the cache on the FeatureReader transformer, you control how often your external data is refreshed. Below I have set the cache to expire 0.1 of an hour (6 minutes) on a PostgreSQL reader. This means if you are running a streaming workflow, the workspace will cache the data for six minutes, after six minutes the first feature to hit the initiator port will trigger a full refresh of the data from the database.


Some considerations when using the cache in a streaming workflow:
  • If the dataset is file-based (e.g. ESRI File Geodatabase) the cache will expire when the table is updated or when the Cache Timeout value is reached. If the dataset is a relational database or API then the cache will expire only based upon the Cache Timeout setting defined.
  • Cache Timeout supports a float value so you can set the cache to expire after part of an hour, e.g. 0.16666 equates to 10 minutes.
  • If there are two FeatureReaders in the workspace then they work independently and you can set a different cache and they don’t interfere with each other. This is powerful as it means you can join to multiple datasets in one stream workflow and still have control over the update frequency.

Was this article helpful?



Please sign in to leave a comment.