Introduction
All data can be categorized as either bounded or unbounded.
Bounded data is finite and has a discrete beginning and end. It is associated with batch processing.
Unbounded data—also referred to as a data stream— is infinite, having no discrete beginning or end, and is associated with stream processing. As well as being continuous, unbounded data typically has the following attributes:
- Data records are small in size.
- Data volumes can be extremely high.
- Data distribution can be inconsistent with quiet and busy periods.
- Data can arrive out of sequence compared to when the event happened.
What is Stream Processing?
Stream processing is a term that groups together the collection, integration, and analysis of unbounded data. It allows organizations to deliver insights across massive datasets on a continuous basis. Typically, it is talked about in the context of big data, with low latency and massive throughput key requirements for any solution. For more information on stream processing, see this blog post, watch our introduction to streaming webinar, and check out this video.
There are three main ways organizations can work with unbounded data in FME: batch, real-time, and stream processing.
Batch Processing of Stored Event Data
In this approach, unbounded data is stored and processed at a specified interval. Depending on the specified interval and the data velocity, batch processing may not be able to meet the real-time requirements of many systems. At best, this approach can be considered “near real-time”. It is best suited for low-volume events.
Real-Time Event Processing
Each event in the unbounded stream is handled separately with connections between events being stored in persistent storage. This is often referred to as complex event processing and is best suited for streams that are low-volume with infrequent events.
With the FME Platform, this can be done with Automations in FME Flow where an incoming event is used as the input data and trigger to deploy a workflow.
Data Stream Processing
Data Stream processing is ideal for high-velocity unbounded data streams. It is a method that allows organizations to quickly deliver insights across massive datasets on a continuous basis. In the case of stream processing, real-time data can be processed in milliseconds. Due to the continuous nature of data streams, stream processing is an ongoing task, whereas real-time event processing is performed at the time that an event occurs.
Stream processing can also be done with the FME Platform using Streams in FME Flow. This article links to stream processing examples and tutorials that show you how to leverage the FME platform to build and deploy stream processing workflows.
Check out this video that compares batch, event, and data stream processing.
Articles
Authoring Stream Workflows
Introduction to Stream Processing in FME
A tutorial on working with streams in FME Form and Flow while covering a few different scenarios in our Demos.
FME Form Tips for Working with Continuous Data Streams
Working with high-volume data streams in FME requires a different approach than the batch workflows you are likely familiar with.
Writing to Databases When Running in Stream Mode
An overview of writing data when running in stream mode. It focuses on database and data lake support.
Windowing Data Streams in FME
Learn how to break the unbounded data stream up based on time into finite chunks of time for processing.
Joining a Stream to an External Dataset
An overview of reading data into a streaming workflow via a Reader and FeatureReader.
Streaming IoT Data from a REST API in FME with the EndlessLooper
A tutorial on creating a workflow that continuously polls an API endpoint using the EndlessLooper custom transformer.
FME Flow Streams
How to use the FME Flow Streams Interface
The steps required to publish, create, and manage streaming workspaces in FME Flow
Demos
Due to the infinite nature of unbounded data and the fact that stream processing systems are designed to handle large volumes of data, certain compromises must be made in terms of the complexity of processing that is supported. If this weren’t the case, the large data volumes that stream processing tools can process compared to batch processing would be compromised.
The following demos provide scenarios and workspaces for the most common stream processing workflows supported on the FME Platform.
Filtering Unbounded Data Streams
Reduce data volumes in memory by filtering an unbounded stream on either attribute values or location, before committing data to disk.
Enriching Unbounded Data Streams
Join the unbounded data to other datasets (databases, APIs) before committing data to disk.
Summarize Unbounded Data Streams
Summarize the unbounded data by calculating time-windowed aggregations before committing data to disk.
Spatial Analysis on Unbounded Data Streams
When working with location-enabled streams, understand the relationship between points in the incoming stream and other features.
Detecting Incidents in Unbounded Data Streams
Detect patterns in memory and then trigger an event when certain criteria are met.
Webinars
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
An overview of real-time capabilities and Automations vs Streams in FME.
Powering Real-Time Decisions with Continuous Data Streams
An introduction to FME stream processing capabilities, stream vs. batch processing, streaming scenarios with spatial examples, and deploying streams on FME Flow.
Power Up Your BI with Geospatial Data
Real-time visualization of stream processing results in a business intelligence dashboard. (Timestamp: 36:27)