FME and Stream Processing

Introduction

All data can be categorized as either bounded or unbounded.

Bounded data is finite and has a discrete beginning and end. It is associated with batch processing.

Unbounded data—also referred to as a data stream— is infinite, having no discrete beginning or end, and is associated with stream processing. As well as being continuous, unbounded data typically has the following attributes:

Data records are small in size.
Data volumes can be extremely high.
Data distribution can be inconsistent with quiet and busy periods.
Data can arrive out of sequence compared to when the event happened.

What is Stream Processing?

Stream processing is a term that groups together the collection, integration, and analysis of unbounded data. It allows organizations to deliver insights across massive datasets on a continuous basis. Typically, it is talked about in the context of big data, with low latency and massive throughput key requirements for any solution. For more information on stream processing, see this blog post, watch our introduction to streaming webinar, and check out this video.

There are three main ways organizations can work with unbounded data in FME: batch, real-time, and stream processing.

Batch Processing of Stored Event Data

In this approach, unbounded data is stored and processed at a specified interval. Depending on the specified interval and the data velocity, batch processing may not be able to meet the real-time requirements of many systems. At best, this approach can be considered “near real-time”. It is best suited for low-volume events.

Real-Time Event Processing

Each event in the unbounded stream is handled separately with connections between events being stored in persistent storage. This is often referred to as complex event processing and is best suited for streams that are low-volume with infrequent events.

With the FME Platform, this can be done with Automations in FME Flow where an incoming event is used as the input data and trigger to deploy a workflow.

Data Stream Processing

Data Stream processing is ideal for high-velocity unbounded data streams. It is a method that allows organizations to quickly deliver insights across massive datasets on a continuous basis. In the case of stream processing, real-time data can be processed in milliseconds. Due to the continuous nature of data streams, stream processing is an ongoing task, whereas real-time event processing is performed at the time that an event occurs.

Stream processing can also be done with the FME Platform using Streams in FME Flow. This article links to stream processing examples and tutorials that show you how to leverage the FME platform to build and deploy stream processing workflows.

Check out this video that compares batch, event, and data stream processing.

Articles

Authoring Stream Workflows

Introduction to Stream Processing in FME

A tutorial on working with streams in FME Form and Flow while covering a few different scenarios in our Demos.

FME Form Tips for Working with Continuous Data Streams

Working with high-volume data streams in FME requires a different approach than the batch workflows you are likely familiar with.

Writing to Databases When Running in Stream Mode

An overview of writing data when running in stream mode. It focuses on database and data lake support.

Windowing Data Streams in FME

Learn how to break the unbounded data stream up based on time into finite chunks of time for processing.

Joining a Stream to an External Dataset

An overview of reading data into a streaming workflow via a Reader and FeatureReader.

Streaming IoT Data from a REST API in FME with the EndlessLooper

A tutorial on creating a workflow that continuously polls an API endpoint using the EndlessLooper custom transformer.

FME Flow Streams

How to use the FME Flow Streams Interface

The steps required to publish, create, and manage streaming workspaces in FME Flow

Demos

Due to the infinite nature of unbounded data and the fact that stream processing systems are designed to handle large volumes of data, certain compromises must be made in terms of the complexity of processing that is supported. If this weren’t the case, the large data volumes that stream processing tools can process compared to batch processing would be compromised.

The following demos provide scenarios and workspaces for the most common stream processing workflows supported on the FME Platform.

Filtering Unbounded Data Streams

Reduce data volumes in memory by filtering an unbounded stream on either attribute values or location, before committing data to disk.

Enriching Unbounded Data Streams

Join the unbounded data to other datasets (databases, APIs) before committing data to disk.

Summarize Unbounded Data Streams

Summarize the unbounded data by calculating time-windowed aggregations before committing data to disk.

Spatial Analysis on Unbounded Data Streams

When working with location-enabled streams, understand the relationship between points in the incoming stream and other features.

Detecting Incidents in Unbounded Data Streams

Detect patterns in memory and then trigger an event when certain criteria are met.

Additional Resources

FME Flow Troubleshooting: Streams

Blog: Capture Data Insights with Stream Processing

FME and Stream Processing

Introduction

What is Stream Processing?

Batch Processing of Stored Event Data

Real-Time Event Processing

Data Stream Processing

Articles

Authoring Stream Workflows

Introduction to Stream Processing in FME

FME Form Tips for Working with Continuous Data Streams

Writing to Databases When Running in Stream Mode

Windowing Data Streams in FME

Joining a Stream to an External Dataset

Streaming IoT Data from a REST API in FME with the EndlessLooper

FME Flow Streams

How to use the FME Flow Streams Interface

Demos

Filtering Unbounded Data Streams

Enriching Unbounded Data Streams

Summarize Unbounded Data Streams

Spatial Analysis on Unbounded Data Streams

Detecting Incidents in Unbounded Data Streams

Webinars

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Powering Real-Time Decisions with Continuous Data Streams

Power Up Your BI with Geospatial Data

Additional Resources

Was this article helpful?

Search

FME and Stream Processing

Introduction

What is Stream Processing?

Batch Processing of Stored Event Data

Real-Time Event Processing

Data Stream Processing

Articles

Authoring Stream Workflows

FME Flow Streams

Demos

Webinars

Additional Resources

Was this article helpful?