Spatial Analysis on Unbounded Data Streams

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

Introduction

When working with location-enabled streams, understanding the relationship between points in the incoming stream and other features is a key workflow. Proximity analysis can help efficiently filter features, clean data, and detect events in real-time.

Streams Diagrams Rebranded - Proximity Analysis - Buffers.png

Examples

Create Buffers

Buffers enable areas to be created around either the incoming stream data or the feature you are comparing against. Once in place, buffers make it easy to determine if a feature is within a certain proximity of another feature.
 

Scenario

A city is monitoring the location of all of its operational vehicles (e.g. SUVs, refuse vehicles, maintenance vans). They have two vehicles that cannot go near schools as they contain hazardous material. The sensors on the vehicles report their location every five seconds. Decision-makers in the city wish to receive alerts when a hazardous vehicle goes within 100m, 200m, and 300m of the school.
 

Workflow

The workspace connects to the message broker, in this case, we are using Kafka. The JSON-formatted payload, which represents the location of the vehicle, is then flattened into individual attributes and the attribute containing the WKT geometry is set to be the geometry of the feature. Since we are only interested in hazardous vehicles, a Tester checks the vehicle IDs to filter them out.

The schools for the entire city are also read and buffers are created 100m, 200m, and 300m in size. The hazardous vehicles’ locations are then compared against the buffers using the SpatialFilter. If a point does fall within a buffer, then the data is written to a database for downstream processes to trigger business processes. 

 

Dynamic Geofences

The functionality of buffers can be extended by using dynamic buffers or geofences that change in location. Proximity comparisons with incoming features are calculated based on the buffer’s location at that moment in time.
 

Scenario

A ride-hailing company wishes to optimize their business by lowering wait times for potential riders who request a driver. The ride-hailing app drivers and users have their locations frequently tracked. If drivers are not at least within 1 km from a user, they will be redistributed to be closer to app users.
 

Workflow

The workspace connects to the driver locations stream, and in this case our message broker is Kafka.  The JSON-formatted payload, which represents the locations of every vehicle, is then flattened into individual attributes and the attribute containing the WKT geometry is set to be the geometry of the feature.

The TimeWindower will group the drivers into 5 minute windows. Using the Sorter and Sampler and grouping by the window_id attribute, the most recent location of each driver in each window will be used. For each window, the Google Big Query table that holds the current locations of all app users will be queried and buffered. The SpatialFilter will check if any drivers fall outside the user buffers, and the vehicle IDs will be passed into a separate driver redistribution workflow.

For more information, and to see this workflow in action, see our video on Stream Processing: Geofencing.

Snap Data to a Network

At best, GPS data only has an accuracy of a couple of meters. To perform network analysis, points need to be snapped to the network. Doing this on an unbounded stream before it is committed to the database is beneficial as it means the data can be filtered and cleaned before loading into the data store.
 

Scenario

An insurance company is trialing a new program where the driver pays for insurance based upon the distance they travel. They have 50,000 people in the trial and each participant has a sensor in their vehicle which tracks their location. The sensors report their location every five seconds.

The insurance company only wants to analyze vehicle movement on the primary routes (highway and main arterial roads). Vehicles on minor routes need to be filtered out, and then vehicles on the major routes need to be snapped to the road network so network analysis can be carried out downstream.
 

Workflow

The workspace connects to the message broker, in this case, we are using Kafka. The JSON-formatted payload, which represents the location of the vehicle, is then flattened into individual attributes and the attribute containing the WKT geometry is set to be the geometry of the feature.

A TimeWindower transformer is then used to sample the data so the vehicle's location is at a 30-second interval instead of 5 seconds. The road network is also read, and using the NeighborFinder vehicles within 5m of the main routes are snapped to the network. Attributes from the road are transferred to the vehicle point (speed limit, road name).

For more information, and to see this workflow in action, see our video on Stream Processing: Snap Data to a Network.

Calculate Distance

Being able to calculate the distance between a target feature and another feature is an important workflow when working with unbounded streams. It allows you to analyze the distance between the location of an asset and another feature. This means rather than storing all of the data from a stream, you can just assess if the distance between two features meets a threshold, and if it does either store the data or trigger an event.
 

Scenario

A utility company wants to build a workflow that enables decision-makers to see which vehicles are closest to a new power outage so they can dispatch a crew to fix it. When a power outage occurs, the closest two vehicles need to be identified and an email sent to a decision-maker with a list of these vehicles.
 

Workflow

The workspace connects to the message broker, in this case, we are using Kafka. The JSON-formatted payload, which represents the power outage event, is then flattened into individual attributes.

The current locations of all vehicles are retrieved from Google Big Query and the NeighborFinder is then used to identify the two closest vehicles to the power outage event. The closest vehicles to the outage are then passed downstream using the FMEFlowJobSubmitter (formerly FMEServerJobSubmitter) transformer (FME Flow (formerly FME Server) Automation writer could also be used) to another FME Flow process which could send the email out to decision-makers.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.