Summarize Unbounded Data Streams

Files

Aggregate Points.fmw
- 200 KB
- Download
Calculate Motion.fmw
- 100 KB
- Download

Introduction

Unbounded streams can contain large amounts of data, but if your end goal is to summarize the data, it would be very costly to store the data before processing. With FME streams, summarized blocks of data defined by time windows and/or geographic boundaries can be aggregated and processed from the stream before being committed to disk.

Examples

Aggregate Points

Scenario

The City of Vancouver has acquired GPS data for the movement of 50,000 vehicles during rush hour. They wish to determine the average speed of vehicles crossing a key bridge in each direction and obtain a count of the number of vehicles that cross the bridge.

Workflow

The workspace starts by reading a stream of data for GPS-tracked vehicles in the Vancouver Lower Mainland. The features are converted into points, and they are discarded if they don’t fall within the geofence surrounding the 2nd Narrows bridge.

Once filtered, the features are grouped based on time using the TimeWindower, and the speed for each vehicle within the window is calculated. The data is then aggregated to calculate the count of vehicles that crossed the bridge and the average speed of the vehicles in each 90-second window. The direction each vehicle is traveling is also calculated (northbound or southbound), so the average speed and count can be split by direction.

The process of aggregating the stream data is applied in the StatisticsCalculator, where a statistical summary of features is released for each time window that is passed into external applications for decision-makers.

For more information and to see this workflow in action, see our video Stream Processing: Summarize Data Through Aggregation.

Calculate Motion Statistics

Often, the speed of a vehicle is a key piece of data that people are interested in analyzing. Rather than storing all of the data and then calculating the speed, you can calculate the average speed of a vehicle over a period of time, all in memory.

Scenario

An insurance company is trialing a new program where the driver pays for insurance based on the distance they travel. They have 50,000 people in the trial, and each participant has a sensor in their vehicle that tracks their location. The sensors report their location every five seconds.

The insurance company only wants to capture the average and maximum speed the vehicle traveled within a 1-minute window, rather than capturing every data point. In doing so, they will reduce their data volumes by 6x.

Workflow

The workspace connects to the message broker; in this case, we are using Kafka. The JSON-formatted payload, which represents the vehicle’s location, is then flattened into individual attributes.

A TimeWindower transformer is then used to enable the data to be grouped into 60-second windows. For each vehicle within the time window, the distance between all reported points is calculated. Then, using this information in combination with the timestamps, the speed at which the vehicle is traveling on each vertex is calculated. The speed on each vertex is then fed into a StatisticsCalculator, which, using the window ID and vehicle ID as a group by, produces an output of the maximum and average speed each vehicle was traveling in that 60-second window. This can then be committed to a database.

For more information, and to see this workflow in action, see our video Stream Processing: Calculating Movement Statistics

Search

Summarize Unbounded Data Streams

Files

Introduction

Examples

Aggregate Points

Scenario

Workflow

Calculate Motion Statistics

Scenario

Workflow

Was this article helpful?