FME Version
Introduction
All data can be categorized as either bounded or unbounded.
Bounded data is finite and has a discrete beginning and end. It is associated with batch processing.
Unbounded data—also referred to as a data stream— is infinite, having no discrete beginning or end, and is associated with stream processing. As well as being continuous, unbounded data typically has the following attributes:
- Data records are small in size.
- Data volumes can be extremely high.
- Data distribution can be inconsistent with quiet and busy periods.
- Data can arrive out of sequence compared to when the event happened.
What is Stream Processing?
Stream processing is a term that groups together the collection, integration, and analysis of unbounded data. It allows organizations to deliver insights across massive datasets on a continuous basis. Typically it is talked about in the context of big data, with low latency and massive throughput key requirements for any solution. For more information on stream processing, see this blog post, watch our introduction to streaming webinar, and check out this video.
There are three main ways organizations can work with unbounded data in FME: batch, real-time, and stream processing.
Batch Processing of Stored Event Data
In this approach, unbounded data is stored and processed at a specified interval. Depending on the specified interval and the data velocity, batch processing may not be able to meet the real-time requirements of many systems. At best, this approach can be considered “near real-time”. It is best suited for low-volume events.
Real-Time Event Processing
Each event in the unbounded stream is handled separately with connections between events being stored in persistent storage. This is often referred to as complex event processing and is best suited for streams that are low-volume with infrequent events.
With the FME Platform, this can be done with Automations in FME Flow where an incoming event is used as the input data and trigger to deploy a workflow.
Data Stream Processing
Data Stream processing is ideal for high-velocity unbounded data streams. It is a method that allows organizations to quickly deliver insights across massive datasets on a continuous basis. In the case of stream processing, real-time data can be processed in milliseconds. Due to the continuous nature of data streams, stream processing is an ongoing task, whereas real-time event processing is performed at the time that an event occurs.
Stream processing can also be done with the FME Platform using Streams in FME Flow (formerly FME Server). This article links to stream processing examples and tutorials that show you how to leverage the FME platform to build and deploy stream processing workflows.
Check out this video that compares batch, event, and data stream processing.
Articles
Authoring Stream Workflows
Introduction to Stream Processing in FME
A tutorial on working with streams in FME Form (formerly FME Desktop) and Flow while covering a few different scenarios in our Demos.
FME Form Tips for Working with Continuous Data Streams
Working with high-volume data streams in FME requires a different approach compared to normal batch workflows that you are likely familiar with.
Writing to Databases When Running in Stream Mode
An overview of writing data when running in stream mode. It focuses on support for databases and data lakes.
Windowing Data Streams in FME
Learn how to break the unbounded data stream up based on time into finite chunks of time for processing.
Joining a Stream to an External Dataset
An overview of reading data into a streaming workflow via a Reader and FeatureReader.
FME Flow Streams
How to use the FME Flow Streams Interface
The steps required to publish, create and manage streaming workspaces in FME Flow
Demos
Because of the infinite nature of unbounded data and the fact stream processing systems are built to handle large volumes of data, when it comes to processing the data, some compromises need to be made in terms of the complexity of processing that is supported. If this wasn’t the case, the large data volumes that stream processing tools can process compared to batch processing would be compromised.
The following demos provide scenarios and workspaces for the most common stream processing workflows supported on the FME Platform.
Filtering Unbounded Data Streams
Reduce data volumes in memory by filtering an unbounded stream on either attribute values or location, before committing data to disk.
Enriching Unbounded Data Streams
Join the unbounded data to other datasets (databases, APIs) before committing data to disk.
Summarize Unbounded Data Streams
Summarize the unbounded data by calculating time-windowed aggregations before committing data to disk.
Spatial Analysis on Unbounded Data Streams
When working with location-enabled streams, understand the relationship between points in the incoming stream and other features.
Detecting Incidents in Unbounded Data Streams
Detect patterns in memory and then trigger an event when certain criteria are met.
Webinars
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
An overview of real-time capabilities and Automations vs Streams in FME.
Powering Real-Time Decisions with Continuous Data Streams
An introduction to FME stream processing capabilities, stream vs. batch processing, streaming scenarios with spatial examples, and deploying streams on FME Flow.
Power Up Your BI with Geospatial Data
Real-time visualization of stream processing results in a business intelligence dashboard. (Timestamp: 36:27)
Please note that this webinar was recorded in FME Server (now FME Flow)
Comments
0 comments
Please sign in to leave a comment.