Data QA: Identifying Bad Topology in Linear Networks

Files

lineartopology1.fmwt
- 2 MB
- Download
lineartopology2.fmwt
- 2 MB
- Download
lineartopology3.fmwt
- 2 MB
- Download
2dparcels.dgn
- 2 MB
- Download
badtopology_2021.2.fmw
- 90 KB
- Download

Introduction

A linear network should consist of lines that meet at a point, without crossing. However, this is not always the case. There are various problems that can occur.

A misaligned point occurs where two lines are meant to meet at a known point, but one line does not properly connect:

An overshoot occurs where two lines are meant to meet at a known point, but one line extends beyond that intersection:

An undershoot occurs where two lines are meant to meet at a known point, but one line fails to reach that intersection:

A missing node occurs where two lines are meant to meet at a known point, but only one of those lines contains the intersection node:

In this case, the red (horizontal) line has an end node, but the green (vertical) line has no node or vertex at the same location.

In general, these issues are very small and usually invisible to the eye - else they would be easily detected without the need for special data validation techniques. There is no specific transformer for these scenarios, but we can use a combination of general geometry-handling transformers to do the job.

Although we use the term "network", these issues also apply to any linear features that are meant to form a closed structure, for example, the parcel boundary lines used in the following example...

Source Data

The dataset for this example, is a set of line features (in a MicroStation DGN dataset) representing property parcel boundaries in the city of Vancouver.

The dataset looks like this in the FME Data Inspector:

The scenario here is to clean up the line features, ensuring all property parcels close correctly. We'll prove this by turning them into polygon features with the AreaBuilder transformer.

Step-by-Step Instructions

Part 1: Locating Bad Linear Geometry

To assess the state of linear geometry the simplest method is to use an AreaBuilder transformer. If the geometry can be turned into polygon features, then it all connects correctly.

A more complicated method uses the TopologyBuilder transformer. This turns the network into a series of nodes and edges. If we can locate nodes that are only used by a single edge, then this indicates an unconnected line. If the line itself is very short (or the gap between a node and a neighboring line is short) then this indicates an overshoot, undershoot, or misaligned point.

For this example we'll stick with the easier AreaBuilder method.

Follow these steps to learn how to identify overshoots, undershoots, misaligned points, and missing nodes.

1. Start FME Workbench and begin with an empty canvas
Select Readers > Add Reader from the menubar.

Set the data format to Bentley MicroStation Design (V8). Select the attached dgn file as the source dataset and click OK to add the reader.

2. Add an AreaBuilder transformer
Connect it to the parcel lines data.

3. Add a FeatureColorSetter transformer
For the sake of clarity, add a FeatureColorSetter transformer to the AreaBuilder:Area output port:

Open the parameters and set Color Scheme to Fixed. Next, set Fill Color to any color you like. This will set the area color of the polygon. Run the workspace and inspect the output features. Incomplete features (those that are invalid in some way) will appear in a separate layer and color:

Inspect the features by zooming in close to their endpoints. You will see if the features are overshoots, undershoots, misaligned points, or missing nodes.

In a road network, some features would be highlights, but not be incorrect, for example, a cul-de-sac or dead-end street. So either some manual assessment will be necessary, or maybe another FME transformer could be used; for example, a LineOnAreaOverlayer or SpatialFilter will show whether the lines overlap a polygon - if they do then they are more likely to be incorrect.

Part 2: Counting Bad Linear Geometry

Assuming the features we've isolated are actually incorrect, then counting them is quite easy using the StatisticsCalculator transformer.

Follow these steps to learn how to count bad linear geometry.

4. Add a StatisticsCalculator
Conect it to the AreaBuilder output port. Open the StatisticsCalculator transformer's parameters.

Select any attribute for the Attribute to Analyze. If there are no attributes available then expose one on the source feature type, or use an AttributeManager to create one. The values aren't important to us.

Select Total Count as the statistic to calculate by clicking it.

5. Re-run the workspace
The output will now have a count of incorrect features, although it will be a total count, not one per type of error.

Part 3: Fixing Bad Linear Geometry

Fixing bad linear geometry can be a case of trial and error. It's good to have a specific tolerance value in mind, but also to use the transformers in the order given. When used in a different order the results can be quite different, possibly introducing small pieces of unwanted linework.

So, follow these steps to learn how to fix bad linear geometry.

6. Add a Snapper transformer to the translation, between the source data and the AreaBuilder
Connect both Snapped and Untouched outputs from the Snapper to the AreaBuilder:

Examine the Snapper parameters. The default values will be fine for most parameters, but you will have to set a Snapping Distance. In this case set the value to 0.2

Re-run the workspace and inspect the output of the Snapper's Snapped port by clicking the green magnifying glass. The result is that there are two fewer invalid features. Both of these were misaligned points that are now snapped into position.

There are a couple of interesting points. Firstly, undershoots and overshoots might also have been fixed by snapping, but there were obvious none within the specified tolerance. The Snapper can handle those scenarios, but its primary use is to handle misaligned points.

Secondly, for one of the features, the misaligned feature had features snapped to it, rather than it being snapped to something else. That's why three features got snapped when only two features needed fixing. It's not ideal, but the tolerance parameter ensures such snapping won't be too extreme.

7. Add a LineExtender transformer between the Snapper and AreaBuilder
Ensure the LineExtender:Stretched output port is the one connected:

Check the parameters and set the Extension Length to 0.5. This will cause all lines to be extended by 0.5m. It should deal with any undershoots. Of course, it will turn them (and all other features) into overshoots, but we can deal with that shortly.

8. Now add an Intersector transformer between the LineExtender and AreaBuilder
The Intersector:Intersected output port is the data we want to keep:

This transformer will cut off overshoots at their intersection points, and create missing nodes.

Now re-run the workspace. There will be a large number of "incomplete" features, but these are merely the overshoots that have been cut off. If you query all other features you'll notice that the polygons are correct and that there are now no overshoots, undershoots, misaligned points, or missing nodes.

9. Recreate linear features with an Intersector
If the data needs to be still used as a network of lines, then the linear features can be recreated by using an Intersector to convert the data back to properly noded lines.

And in the case where some of the incomplete lines are still required - for example, cul-de-sacs - then a LengthCalculator/Tester combination can be used to filter out lines that are shorter than what is normally expected (i.e. they're either an overshoot or the extension that was applied, not a real feature):

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

Search