Writing to Databases When Running in Stream Mode

Introduction

This article provides an overview of writing data when running in stream mode. It focuses on support for databases and data lakes.

Writers

Two parameters should be considered when writing to databases in stream mode: Bulk Insert and Features per Transaction.

Bulk Insert

Due to the high volumes often associated with streams, bulk insert (or equivalent) needs to be set to Yes. Setting bulk insert to yes means the SQL COPY (which is more performant) command is used instead of SQL INSERT.

Features per Transaction

Features per transaction are the number of features that FME places in each transaction before committing them to the database. This parameter is critical when writing to streams. If you make the number too high and the number of features written is low, the data won't get written. If the value is too low and the feature output is too high, the database will be overwhelmed with too many transactions.

For example, if you have a workflow with 20,000 messages per minute, if you set the features per transaction to be 1000, the data is committed roughly every three seconds, which is probably too often. If you set the features per transaction to be 500,000, the data is committed roughly every 25 minutes, which is likely not often enough.

	Bulk Insert	Features per Transaction	Other
Amazon Aurora	N/A	Set based on message rate
Amazon DynamoDB	N/A	Called Batch Size, it should be set. The maximum value is 25.
Amazon Redshift	N/A	Set based on message rate.
Azure Cosmos	Yes	N/A	Concurrent Requests need to be set based upon the number of features. More info here.
Azure SQL Database	Yes	Set based on message rate.
Databricks	N/A	Set based on message rate.
Google BigQuery	Yes	Called Table Splitting, it should be set based on the message rate.	The number of errors allowed should also be set if errors are expected.
MariaDB/SkySQL	Yes	Set based on message rate.
Oracle	Yes	Set based on message rate.
PostgreSQL	Yes	Set based on message rate.
Snowflake	N/A	Set based on message rate.
SQL Server	N/A	Set based on message rate.

FeatureWriter

Writing via the FeatureWriter in stream mode to databases is supported. Parameters and considerations are the same as defined in the table above.

The main reason to use a FeatureWriter is if you wish to leverage the Group By to write features based on when the group changes. For example, in combination with the TimeWindower transformer, you could write data out whenever the window ends. Group By is not supported by writers. Since the FeatureWriter is a feature-holding transformer, features will not be released through its output ports in stream mode unless Group By is enabled and set to "When Group Changes (Advanced)".

Other Transformers

There are several other transformers you can use to write data out when running in stream mode.

AmazonAthenaConnector
AzureQueueStorageConnector
SQLExecutor
GoogleBigQueryConnector

Additional Resources

Webinar: Powering Real-Time Decisions with Continuous Data Streams

Search