Data QA: Identifying Spikes and Outliers with FME

Files

bathymetry.dgn
- 70 KB
- Download
qaspikeremoval.fmwt
- 70 KB
- Download
qaspikeremoval_2021.1.fmwt
- 60 KB
- Download

Introduction

Spikes (or Outliers) in spatial data occur when a vertex in a feature has an x, y, or z value that is so incorrect as to result in a spike-like appearance.

Spikes are a fairly specific type of defect, and there is a specific FME transformer designed to handle them: the SpikeRemover.

In this example, we will look at identifying and fixing spikes in a dataset containing contour lines using the SpikeRemover.

Source Data

The source data is a MicroStation DGN dataset containing bathymetric data for English Bay in the City of Vancouver:

Map tiles by Stamen Design, under CC-BY-3.0. Data by OpenStreetMap, under CC-BY-SA.

There are points to denote depth (which we can ignore) and contour lines. Both are 2.5D (i.e. they have Z values for each vertex) with depths in fathoms (a fathom is equal to 6 feet). We'll investigate if there are any spikes in the contours, particularly in the Z values that cannot be visually inspected so easily.

Step-by-Step Instructions

Follow these steps to learn how to locate and fix spike features with a SpikeRemover transformer. Note that the SpikeRemover transformer gives no way to locate a spike without also fixing it, apart from a point feature that indicates the removed vertex.

1. Start FME Workbench
Start FME Workbench and create a new canvas. Add a new Reader to the canvas by selecting Readers > Add Reader from the menu bar. Set the data format to Bentley MicroStation Design (V8). Select the attached MicroStation dataset as the source and click OK to add the reader.

Set the data format to Bentley MicroStation Design (V8). Select the attached MicroStation dataset as the source, set the Coordinate System to UTM83-10, and click OK to add the reader to the workspace.

As of FME 2025.2, the Coordinate System parameter is now configured within the Parameters dialog of each reader/writer format. For more information, including details about the change and affected transformers, please see Coordinate System Parameter Location Change.

When prompted, only select the Contours feature type (level) to add to the workspace.

2. Add a SpikeRemover

Add a SpikeRemover transformer to the canvas and connect the DGN reader to it. Open the transformer parameters. There are parameters that let us control the maximum angle and the maximum length of a spike. A spike will be removed when the angle it creates is less than or equal to the maximum angle, and when the line segment is no longer than the maximum line length.

Set the angle parameters to a value of 10 and the length parameter to 250; i.e. the maximum angle is 10 degrees and the maximum length is 250 meters (the coordinate system is UTM so the units are degrees/meters). Set Remove Spikes Iteratively to No.

3. Run the Workspace and Inspect the Output.

Run the workspace and inspect the output of the SpikeRemover transformer by clicking the green magnifying glass. For this step, ensure that feature caching is enabled by clicking the drop-down arrow beside the green Run button on the menu bar and ensuring Enable Feature Caching is checked.

You can see that a single spike was detected and removed.

Notice that a point feature emerges from the Removed output port to denote where the spike vertex was removed. If the QA process is intended to identify spikes for fixing in a different application, then the Removed output port can be saved to act as a flag for where edits should take place.

If the requirement is for FME to fix the problems, then the Changed port outputs the line feature with the spike removed.

4. Removing Less Extreme Spikes
Although the first set of parameters were successful in removing a spike, there may be additional less extreme spikes that need to be removed. Let's experiment with the SpikeRemover parameters. Open the SpikeRemover and set the angle parameter to 45 and run the translation again. This time 7 line features have had 9 spikes removed:

Unfortunately, this has also removed some points that were not really spikes, like so:

Therefore it's important to be able to experiment with the parameters to get the maximum correction of spikes, without removing valid vertices.

5. Counting the Number of Vertices Removed
To count the number of vertices removed, add a StatisticsCalculator transformer to the workspace and connect the SpikeRemover:Removed port to it. In the StatisticsCalculator parameters pick any attribute to analyze (_spike_angle is convenient) and check off the Total Count parameter.

The output of this transformer now contains an attribute to count the number of spikes fixed.

6. Inspecting for 3D Spikes and Outliers
Inspect the original data by clicking the green magnifying glass on the Contours feature class. In the Visual Preview window that opens, use the toolbar button to switch the view to 3D mode:

Pivot the display (use the "Orbit" tool on the toolbar) and you will notice a previously unnoticed spike in one Z value:

7. Removing Spikes and Outliers in the Z-Axis

Fortunately, the SpikeRemover also operates in three dimensions, allowing us to remove the spike in the Z-axis. Open the SpikeRemover parameters and change Dimension to 3D.

Run the workspace again and inspect the output in 3D. You will see the transformer has removed the spike in the Z-axis.

We now have a dataset that has been cleaned of spikes. If automatic cleaning is not desirable then the spike removal points can be used to identify places to check where spikes might be manually resolved.

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

Search