FME Version
Introduction
This article provides an overview of writing data when running in stream mode. It focuses on support for databases and data lakes.
Writers
Two parameters should be considered when writing to databases in stream mode: Bulk Insert and Features per Transaction.
Bulk Insert
Due to the high volumes often associated with streams, bulk insert (or equivalent) needs setting to Yes. Setting bulk insert to yes means the SQL COPY (which is more performant) command is used instead of SQL INSERT.
Features per Transaction
Features per transaction are the number of features that FME places in each transaction before committing them to the database. This parameter is critical when writing to streams. If you make the number too high and the number of features written is low, the data won't get written. If the value is too low and the feature output is too high, the database will be overwhelmed with too many transactions.For example, if you have a workflow with 20,000 messages per minute, if you set the features per transaction to be 1000, the data is committed roughly every three seconds, which is probably too often. If you set the features per transaction to be 500,000, the data is committed roughly every 25 minutes, which is likely not often enough.
Bulk Insert | Features per Transaction | Other | |
---|---|---|---|
Amazon Aurora | N/A | Set based on message rate | |
Amazon DynamoDB | N/A | Called Batch Size, it should be set. The maximum value is 25. | |
Amazon Redshift | N/A | Set based on message rate. | |
Azure Cosmos | Yes | N/A | Concurrent Requests need to be set based upon the number of features. More info here. |
Azure SQL Database | Yes | Set based on message rate. | |
Google BigQuery | Yes | Called Table Splitting, it should be set based on the message rate. | The number of errors allowed should also be set if errors are expected. |
MariaDB | Yes | Set based on message rate. | |
Oracle | Yes | Set based on message rate. | |
PostgreSQL | Yes | Set based on message rate. | |
Snowflake | N/A | Set based on message rate. | |
SQL Server | N/A | Set based on message rate. |
FeatureWriter
Writing via the FeatureWriter in stream mode to databases is supported. Parameters and considerations are the same as defined in the table above.
The main reason to use a FeatureWriter is if you wish to leverage the Group By to write features based upon when the group changes. For example, in combination with the TimeWindower transformer, you could write data out whenever the window ends. Group By is not supported by writers.
Other Transformers
There are several other transformers you can use to write data out when running in stream mode.- AmazonAthenaConnector
- AzureQueueStorageConnector
- SQLExecutor
- GoogleBigQueryConnector
Limitations
You also cannot use readers when running in stream mode. If you wish to bring data into a workflow alongside the stream, you can use the FeatureReader or DatabaseJoiner.
Additional Resources
Webinar: Introduction to Data Stream Processing
Webinar: Empowering Real-Time Decision Making with Data Streaming
Comments
0 comments
Please sign in to leave a comment.