Data QA: Identifying Consecutive Duplicate Vertices with FME

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

Introduction

A duplicate vertex (duplicate point) occurs when a geometry has one or more vertex that occurs multiple times within the feature. Duplicate vertices are those with identical X,Y, and Z coordinate values, to as many decimal places as exist in the data.

Duplicate vertices are not only a sign of lower quality data, they can also be a data format problem. Some formats permit duplicate vertices (for example, MicroStation DGN allows zero-length lines) while other formats prohibit duplicate vertices (for example Oracle Spatial).

The duplicate vertex might occur sequentially in the geometry (for example, A,B,C,C,D,E) or it might occur out of sequence (A,B,C,D,C,E). It might just be duplicated once (A,B,C,C,D), or it might be duplicated multiple times (A,B,C,C,C,D,C,E,C).

Of course, sometimes a duplicate vertex is valid; for example a polygon start and end point should be identical if it is to close properly (A,B,C,D,E,A) and sometimes a linear feature should loop around and rejoin mid-point (A,B,C,D,E,C); so it is not always easy to identify invalid features on this basis alone.

There are various FME transformers that can be used to identify duplicate vertices, but some transformers - or combinations of transformers - will be much more efficient than others.

  • GeometryValidator: This transformer identifies and fixes duplicate vertices that occur consecutively within a single geometry.
  • ClosedCurveFilter: This transformer identifies features that form a closed loop, and can, therefore, be used to detect (or eliminate from suspicion) features with duplicate end points.
  • CoordinateExtractor: This transformer extracts a list of coordinates from a feature, which can then be analyzed to look for duplicates.

In general, the GeometryValidator is used more often because consecutive duplicate vertices are a more obvious issue.

However, the CoordinateExtractor is better for detecting duplicate vertices that occur out of sequence, so that further investigation can take place.

This example uses the GeometryValidator transformer to identify sequential duplicate points. A second example uses a combination of ClosedCurveFilter and CoordinateExtractor to identify duplicate points that are unsequenced.

 

​​​​​​​Source Data

The source data is a MicroStation Design file containing line features that represent building outlines:

duplicateverts1.png

The scenario is that we wish to validate and clean the data before it is put into production use.

 

Step-by-Step Instructions

Part 1: Locating Consecutive Duplicate Vertices

Follow these steps to learn how to locate consecutive duplicate vertices with a GeometryValidator transformer.
 

1. Add Source Data
Start FME Workbench and begin with an empty canvas. Select Reader > Add Reader from the menubar. Set the data format to Bentley MicroStation Design (V8). Select the attached MicroStation dataset as the source. If you click the parameters button you'll find there is an advanced parameter to remove duplicate points:

duplicateverts2.png

Ensure this parameter is turned off as we want to identify where and how many duplicate vertices there are. So simply click OK to add the reader. If/when prompted, select the BuildingFootprints level as the data to be read.

 

2. Inspect the Data
Click the reader feature type on the canvas. On the menu that pops up, select the View Source Data option to view the data in the Visual Preview window. Examine the data. The data looks correct at a glance, and it is difficult to identify where there might be duplicate vertices.

 

3. Add a GeometryValidator Transformer
Add a GeometryValidator to the canvas and connect the Microstation V8 Reader to it. Open the transformer parameters. Click the empty box under Issues to Detect to bring up the options, and select Duplicate Consecutive Points from the drop-down list. Leave the Parameters as is, and change Attempt to Repair to No.

2021-12-01_11-33-43.png

Although duplicate points are fixable, for now, we'll just check where they are.

 

4. Inspecting the GeometryValidator Output
Click on the GeometryValidator to bring up the pop-up menu, and click Run To This. Once the workspace has finished running, click on the magnifying glass next to each output port to inspect the output. Notice that two features exit through the InvalidParts port. These are features that have been flagged as invalid. Upon inspecting the output, we can see the duplicate points are highlighted through the IssueLocations port.
 

duplicateverts5.png

 

 

Part 2: Counting Consecutive Duplicate Vertices

Counting the number of invalid features and bad points is quite easy because we have already filtered them out. For example, even the Workbench feature counts show us the numbers involved:

duplicateverts6.png

To create a count stored in an attribute is simple using the StatisticsCalculator transformer.

Follow these steps to learn how to count duplicate vertex features.

 

5. Add a StatisticsCalculator
Add a StatisticsCalculator to the workspace and connect it to the GeometryValidator Failed port. Also add a StatisticsCalculator and connect it to the GeometryValidator IssueLocations port. For both the StatisticsCalculator parameters, select any attribute as the Attribute. Check the box for Total Count as the statistic to calculate and press OK to accept these new parameters.
2021-12-01_12-00-36.png

Re-run the workspace. This time the output should include an attribute that denotes how many bad features there are of each type. If you want to change the name of the output feature from the StatisticsCalculator you can use the AttributeRenamer transformer.

 

Part 3: Fixing Consecutive Duplicate Vertices

Fixing invalid duplicate vertices is easily achieved by deleting one of the duplicates. The GeometryValidator has an option to do just that.
 

6. Repairing the Duplicate Vertices
Open the GeometryValidator parameters dialog and change the Attempt Repair parameter to Yes. Re-run the workspace. Of course this time there will be no failed features, but the Issue Locations will still be output to show where the duplicate points existed.
 

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.