Data QA: Identifying Small Polygon Features

Files

dataqa-smallareas-fix.fmwt
- 3 MB
- Download
qa-smallpolys-data.zip
- 3 MB
- Download
dataqa-smallareas-locate.fmwt
- 3 MB
- Download
dataqa-smallareas-count.fmwt
- 3 MB
- Download
Small_Polygon_Features.fmw
- 100 KB
- Download

Introduction

Small polygons are those whose area is less than a specified tolerance. These can be found through the combination of measuring the area of polygons and then applying a test condition with a filter transformer.

Testing for small polygons is a good QA test because polygons below a certain size are usually indicative of problems such as overlaps, slivers, and misaligned linework.

It's also very simple to count how many of these bad features exist. However, as discussed below, fixing polygons like this automatically is more difficult.

Source Data

The first source dataset for this example is a set of lines (in an AutoCAD DWG dataset) representing the outlines of property boundaries in the City of Vancouver.

The second dataset (in Esri Geodatabase format) is a set of point features that represent addresses.

The datasets look like this in the FME Data Inspector:

The scenario here is that we wish to transform these lines into true polygon features. That is very simple in FME, but we will add some QA checks to ensure that no undersized polygons are being created. In particular, an undersized polygon that contains an address point is very bad news.

Step-by-Step Instructions

Part 1: Locating Small Polygons

Follow these steps to learn how to identify small polygon features.

1. Start FME Workbench and add parcel data

Select Readers > Add Reader from the menubar.

Set the data format to Autodesk AutoCAD DWG/DXF. Select the attached dwg file as the source dataset. Click the Parameters button and set Group Entities By to "Attribute Schema".

Click OK and OK again to add the reader.

2. Create polygons from line features
The source dataset is made up of line features. To create polygons requires a single FME transformer: the AreaBuilder.

So, add an AreaBuilder transformer. Connect it to the ParcelLines feature type.

3. Measuring the area of the small polygons
To test for small polygons we first need to measure the area of each of them. This is done with the AreaCalculator transformer.

So, add an AreaCalculator transformer and connect it to the AreaBuilder:Area output port.

4. Add address point data
In most cases we would now add a filter transformer to test the area. But in this example we're going to throw in an extra level of detail: address points.

Again, select Readers > Add Reader from the menubar.

This time set the data format to Esri Geodatabase (File Geodb Open API). Select the attached Geodatabase as the source. When prompted, select only the PostalAddress table to add to the workspace. The PostcodeBoundaries table is not required.

5. Overlay address point attributes onto polygons
To transfer address point attributes onto the polygons we'll use a PointOnAreaOverlayer transformer.

So place a PointOnAreaOverlayer transformer.

Connect the PostalAddress feature type to the PointOnAreaOverlayer:Point input port and the AreaCalculator:Output port to the PointOnAreaOverlayer:Area input port.

Test run the workspace and inspect the PointOnAreaOverlayer:Area output port by clicking the green magnifying glass. The results should be a set of polygon features with an attribute for the polygon area (_area) and - if it overlaps with an address point - attributes defining the address.

6. Filter out bad features
Now let's filter the bad features. The building regulations for Vancouver state:

The floor area of a micro dwelling must be at least 29.7m2

We're looking at land parcel boundaries, not building footprints, but still, 30m2 seems a good cutoff point below which a polygon is likely to be erroneous.

We want to test for three scenarios:

Polygons of the correct size
Polygons of an incorrect size
Polygons of an incorrect size containing an address point

The best way to test multiple conditions like this is with a TestFilter transformer.

So, add a TestFilter transformer connected to the PointOnAreaOverlayer:Area output port.

7. Set the TestFilter parameters
Open the TestFilter parameters dialog. Double-click in the first row under the Test Condition column. This will open us a condition editor.

In this editor, set a test clause for _area is less than (or equal to) 30. Set up a second test clause for _overlaps is greater than (or equal to) 1:

If both clauses are true, the polygon will be both too small and contains an address point. But to test for both clauses - not just one or the other - set the Logic to AND. Finally, at the bottom of the dialog, set the Output Port name to "Small with Address Point" and click OK to return to the previous dialog.

Now double-click on test condition for the Else If row and set it to _area is less than or equal to 30.

Set the Output Port name to "Small without Address Point" and click OK. Set the Output Port for the bottom row to Correct Size. The definitions should now look like this:

If so, click OK to return to the main canvas.

8. Run the workspace and inspect the output

Run the workspace. The result will be three outputs that represent the three different states. Inspect the features that are flagged as small to ensure that they are incorrect in some way. Most are polygons where the lines of two boundaries have crossed, creating a small sliver polygon.

Part 2: Counting Small Polygons

Counting the number of bad features is quite easy because we have already filtered them out. For example, even the Workbench feature counts show us the numbers involved.

To create a count stored in an attribute is simple using the StatisticsCalculator transformer.

Follow these steps to learn how to count small polygon features.

9. Add a StatisticsCalculator
Add a StatisticsCalculator and connect it to the first TestFilter output port. Open the parameters dialog.

First select _area as the Attribute. In truth it doesn't really matter which attribute we select, since we only want a count of features.

Select Total Count as the attribute to calculate. That will provide a count of the bad features. Click OK to close the dialog.

10. Duplicate the existing StatisticsCalculator
To do this, select it and press Ctrl+D. Connect it to the second TestFilter output port.

Because this is a duplicate, there's no need to open its parameters dialog and make any changes.

11. Re-run the workspace
This time the output should include an attribute that denotes how many bad features there are of each type.

Part 3: Fixing Small Polygons

Fixing small polygons is not a simple task. Simply erasing such features could lead to gaps in a topological coverage, and not all of them will have a common ID number by which to merge (dissolve) them together.

However, there are some solutions that could be applied:

a) Delete the small polygons then apply the SliverRemover transformer to the remaining data to fill in any gaps.

b) Choose one of the features neighboring a small polygon and dissolve the two together.

Follow these steps to learn how to fix small polygon features, using the NeighborFinder/Dissolver.

12. Delete the two StatisticsCalculators
If you did not add any StatisticsCalculators you may ignore this step.

13. Add a Counter transformer
Connect it to the TestFilter:Correct Size port. This creates a unique ID for each correctly sized polygon.

14. Add a NeighborFinder transformer
Connect the Small Polygon features (the other two ports from the TestFilter) to the NeighborFinder:Base, and connect the Counter:Output port to the NeighborFinder:Candidate. Open the NeighborFinder parameters and click on the Attribute Accumulation drop down menu. Click on Merge Attributes so that attributes are merged in the output.

15. Add a Dissolver transformer
Connect it to the NeighborFinder:MatchedBase port to it, and also create a second connection out of the Counter:Output port to the Dissolver. Your workspace will now look like this:

16. Open the Dissolver parameters dialog
Click the Group Processing box and set the Group By Parameter to "_count". This will cause small polygons to be dissolved into their neighbor.

17. Run the workspace and inspect the Dissolver output
Most of the small polygons are fixed by being dissolved into their neighbor:

You can add a final AreaCalculator transformer to help prove that this is the case, if you don't want to inspect the data visually.

Data Attribution

The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.

Search