FME and Stream Processing

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2021.0

Introduction

All data can be categorized as either bounded or unbounded.

Bounded data is finite and has a discrete beginning and end. It is associated with batch processing.

Unbounded data—also referred to as a data stream— is infinite, having no discrete beginning or end and is associated with stream processing. As well as being continuous, unbounded data typically has the following attributes:

  • Data records are small in size.
  • Data volumes can be extremely high.
  • Data distribution can be inconsistent with quiet and busy periods.
  • Data can arrive out of sequence compared to when the event happened.

 

What is Stream Processing?

Stream processing is a term that groups together the collection, integration, and analysis of unbounded data. It allows organizations to deliver insights across massive datasets on a continuous basis. Typically it is talked about in the context of big data, with low latency and massive throughput key requirements for any solution. For more information on stream processing, see this blog post and watch our introduction to streaming webinar.

There are three main ways organizations can work with unbounded data in FME: batch, real-time, and stream processing.

 

Batch Processing of Stored Event Data

In this approach, unbounded data is stored and processed at a specified interval. Depending on the specified interval and the data velocity, batch processing may not be able to meet the real-time requirements of many systems. At best, this approach can be considered “near real-time”. It is best suited for low-volume events.

 

Real-Time Event Processing

Each event in the unbounded stream is handled separately with connections between events being stored in persistent storage. This is often referred to as complex event processing and is best suited for streams that are low-volume with infrequent events. 

With the FME Platform, this can be done with Automations in FME Flow where an incoming event is used as the input data and trigger to deploy a workflow. 

 

Data Stream Processing

Data Stream processing is ideal for high-velocity unbounded data streams. It is a method that allows organizations to quickly deliver insights across massive datasets on a continuous basis. In the case of stream processing, real-time data can be processed in milliseconds. Due to the continuous nature of data streams, stream processing is an ongoing task, whereas real-time event processing is performed at the time that an event occurs.

Stream processing can also be done with the FME Platform using Streams in FME Flow (formerly FME Server). This article links to stream processing examples and tutorials that show you how to leverage the FME platform to build and deploy stream processing workflows.
 

Articles

Authoring Stream Workflows

Introduction to Stream Processing in FME

A tutorial on working with streams in FME Form (formerly FME Desktop) and Flow while covering a few different scenarios in our Demos.

FME Form Tips for Working with Continuous Data Streams

Working with high-volume data streams in FME requires a different approach compared to normal batch workflows that you are likely familiar with.

Writing to Databases When Running in Stream Mode

An overview of writing data when running in stream mode. It focuses on support for databases and data lakes.

Windowing Data Streams in FME

Learn how to break the unbounded data stream up based on time into finite chunks of time for processing.

Joining a Stream to an External Dataset

An overview of reading data into a streaming workflow via a Reader and FeatureReader.
 

FME Flow Streams

How to use the FME Flow Streams Interface

The steps required to publish, create and manage streaming workspaces in FME Flow

FME Flow Licensing Considerations

Should you license stream workflows using Standard or CPU-Usage (formerly Dynamic) Engines licences?

 

Demos

Because of the infinite nature of unbounded data and the fact stream processing systems are built to handle large volumes of data, when it comes to processing the data, some compromises need to be made in terms of the complexity of processing that is supported. If this wasn’t the case, the large data volumes that stream processing tools can process compared to batch processing would be compromised.

The following demos provide scenarios and workspaces for the most common stream processing workflows supported on the FME Platform.
 

Filtering Unbounded Data Streams

Reduce data volumes in memory by filtering an unbounded stream on either attribute values or location, before committing data to disk.

Enriching Unbounded Data Streams

Join the unbounded data to other datasets (databases, APIs) before committing data to disk.

Summarize Unbounded Data Streams

Summarize the unbounded data by calculating time-windowed aggregations before committing data to disk.

Spatial Analysis on Unbounded Data Streams

When working with location-enabled streams, understand the relationship between points in the incoming stream and other features.

Detecting Incidents in Unbounded Data Streams

Detect patterns in memory and then trigger an event when certain criteria are met.
 

Webinars

Introduction to Data Stream Processing

A high-level overview of data streaming, stream vs batch processing, market trends, benefits for organizations, and spatial data streams.

Empowering Real-Time Decision Making with Data Streaming

A technical walkthrough of FME stream processing capabilities, authoring tips, streaming scenarios, and deploying streams on FME Flow.

Power Up Your BI with Geospatial Data

Real-time visualization of stream processing results in a business intelligence dashboard. (Timestamp: 36:27)

 

Note: These webinars were recorded using FME Server, so there may be some differences from the version being used.
 

Additional Resources

FME Flow Troubleshooting: Streams

Blog: Capture Data Insights with Stream Processing 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.