Writing to Databases When Running in Stream Mode

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2021.0

Introduction

This article provides an overview of writing data when running in stream mode. It focuses on support for databases and data lakes.
 

Writers

Two parameters should be considered when writing to databases in stream mode: Bulk Insert and Features per Transaction.
 

Bulk Insert

Due to the high volumes often associated with streams, bulk insert (or equivalent) needs setting to Yes. Setting bulk insert to yes means the SQL COPY (which is more performant) command is used instead of SQL INSERT.
 

Features per Transaction

Features per transaction are the number of features that FME places in each transaction before committing them to the database. This parameter is critical when writing to streams. If you make the number too high and the number of features written is low, the data won't get written. If the value is too low and the feature output is too high, the database will be overwhelmed with too many transactions.

For example, if you have a workflow with 20,000 messages per minute, if you set the features per transaction to be 1000, the data is committed roughly every three seconds, which is probably too often. If you set the features per transaction to be 500,000, the data is committed roughly every 25 minutes, which is likely not often enough.
 Bulk InsertFeatures per TransactionOther
Amazon AuroraN/ASet based on message rate 
Amazon DynamoDBN/ACalled Batch Size, it should be set. The maximum value is 25. 
Amazon RedshiftN/ASet based on message rate. 
Azure CosmosYesN/AConcurrent Requests need to be set based upon the number of features. More info here
Azure SQL DatabaseYesSet based on message rate. 
Google BigQueryYesCalled Table Splitting, it should be set based on the message rate.The number of errors allowed should also be set if errors are expected.
MariaDBYesSet based on message rate. 
OracleYesSet based on message rate.  
PostgreSQLYesSet based on message rate. 
SnowflakeN/ASet based on message rate. 
SQL ServerN/ASet based on message rate.  

 

FeatureWriter

Writing via the FeatureWriter in stream mode to databases is supported. Parameters and considerations are the same as defined in the table above.

The main reason to use a FeatureWriter is if you wish to leverage the Group By to write features based upon when the group changes. For example, in combination with the TimeWindower transformer, you could write data out whenever the window ends. Group By is not supported by writers.
 

Other Transformers

There are several other transformers you can use to write data out when running in stream mode.
  • AmazonAthenaConnector
  • AzureQueueStorageConnector
  • SQLExecutor
  • GoogleBigQueryConnector
 

Limitations

You also cannot use readers when running in stream mode. If you wish to bring data into a workflow alongside the stream, you can use the FeatureReader or DatabaseJoiner.

 

Additional Resources

Webinar: Introduction to Data Stream Processing

Webinar: Empowering Real-Time Decision Making with Data Streaming

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.