Summarize Unbounded Data Streams

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2021.0

Introduction

Unbounded streams can contain large amounts of data, but if your end-goal is to summarize the data, it would be very costly to store the data before processing.  With FME streams, summarized blocks of data defined by time windows and/or geographic boundaries can be aggregated and processed from the stream before being committed to disk.

Streams Diagrams - Summarize Data - Aggregate Points

Examples

Aggregate Points

Scenario

The City of Vancouver has acquired GPS data for the movement of 50,000 vehicles during rush hour. They wish to see what the average speed is for vehicles crossing a key bridge in each direction and obtain a count of how many vehicles cross the bridge.
 

Workflow

The workspace starts by reading a stream of data for GPS-tracked vehicles in the Vancouver Lower Mainland. The features are converted into points, and they are discarded if they don’t fall within the geofence surrounding the 2nd Narrows bridge. 

Once filtered, the features are grouped based upon time using the TimeWindower and the speed for each vehicle within the window is calculated. The data is then aggregated to calculate the count of vehicles that crossed the bridge and the average speed of the vehicles in each 90-second window. The direction each vehicle is traveling is also calculated (northbound or southbound) so the average speed and count can be split by direction.

The process of aggregating the stream data is applied in the StatisticsCalculator where a statistical summary of features is released for each time window are passed into external applications for decision-makers.

 

Calculate Motion Statistics

Often, the speed of a vehicle is a key piece of data that people are interested in analyzing. Rather than storing all of the data and then calculating the speed, you can calculate the average speed of a vehicle over a period of time all in memory.
 

Scenario

An insurance company is trialing a new program where the driver pays for insurance based upon the distance they travel. They have 50,000 people in the trial and each participant has a sensor in their vehicle which tracks their location. The sensors report their location every five seconds.

The insurance company only wants to capture the average and max speed the vehicle traveled within a 1-minute window rather than capturing every data point. In doing so they will reduce their data volumes by 6x.
 

Workflow

The workspace connects to the message broker, in this case, we are using Kafka. The JSON-formatted payload, which represents the vehicle’s location, is then flattened into individual attributes.

A TimeWindower transformer is then used to enable the data to be grouped into 60-second windows. For each vehicle in the time window, the distance between all reported points is calculated, and then using this in combination with the timestamps, the speed the vehicle is traveling on each vertex is calculated. The speed on each vertex is then fed into a StatisticsCalculator which, using the window ID and vehicle ID as a group by, produces an output of the maximum and average speed each vehicle was traveling in that 60-second window. This can then be committed to a database.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.