Validate your Data's Attributes with the AttributeValidator Transformer

Files

attributevalidatorexample2018-pivot.fmwt
- 30 KB
- Download
attributevalidatorexample2018-begin.fmwt
- 10 KB
- Download
attributevalidatorexample2018.fmwt
- 20 KB
- Download

Introduction

Attribute validation is the cornerstone of high-quality data. Any software can act as an amplifier. In the case of FME, if you distribute poor-quality data to a wide range of users or formats, then you’ve amplified the poor data quality issue. If you validate your data before loading it into your data repository, you amplify the benefits of a single source of truth.

FME has always been able to validate your data attributes using transformers such as the Tester, AttributeCreator (with conditional values), Joiner (to use validation lookup tables), etc. However, this is somewhat tedious and ad hoc. The AttributeValidator, which consolidates many attribute validation tasks under one umbrella, is analogous to the AttributeManager, which consolidates many attribute processing tasks.

AttributeValidator can be used in conjunction with the GeometryValidator to ensure that all your data conforms to your target data model before loading, thereby reducing the number of features that may be rejected due to data quality issues.

The role of the AttributeValidator is to ensure that your attribute data loads into the target format data model.

Validation rules handled by AttributeValidator include:

Attribute type (integer, float, char, xml, json, etc.)
In – either a list or range – good for domain validation
Regular Expressions
Unique
Not Null
…and more…

Check the AttributeValidator user documentation for a full list of the validation operators

Video

This video was taken in an older version of FME. The interface may look a bit different but the steps are still correct.

Source Data

We’ll work with cell phone signal data and validate it for loading into a simple database, generating a validation report as a CSV file.

Here’s the source data:

With the AttributeValidator, we’ll validate the following attributes:

CodePrefix: ensure values match a domain list
StationID: ensure StationID is unique and an integer
Quality: ensure the values are in a specified range
Power: ensure the values are in a specified range
Num_measures: ensure the values are in a specified range and are an integer
CodeValue: ensure the values are integers
JSON: check that the attribute contains a valid JSON string

Step-by-step Instructions

Example 1

Open the attached FME Workspace Template. It reads from a CSV file and writes out two validation reports. The workspace is complete and annotated.

The following describes some of the key aspects of the workspace:

AttributeValidator: Obviously, the key part of this workspace. The validation tests have been configured as shown in the AttributeValidator annotation and are shown below:

Most of the tests are self-explanatory.

CodePrefix: We’ve created two tests that perform the same function, primarily for illustrative purposes. The first uses the IN operator (ensure the CodeName is in the list ABC, ABD, TXU, TXV) and the Regular Expressions (ensure that the CodePrefix only has the 3 characters which can be ABC or D)
Quality, Power: range test. A range assignment dialog helps set up the ranges:

The syntax is ‘[‘ means inclusive (greater than or equal to), ‘(‘ means exclusive (greater than) . So [0,10) means “greater than or equal to 0 and less than 10”. The same syntax can be used with the IN operator to set a range.

StationID: check that the value is unique.

The AttributeValidator validates against all tests; therefore, in this example, num_measures must validate as both an integer and within the range [0, 10].

AttributeValidator Output

If all attributes on the feature pass their validation tests, then the feature is output via the Passed port. If any test fails, then the feature is output via the Failed port.

AttributeValidator adds two attributes to the feature if a test fails:

_fme_validation_message - the first failed test message for the attribute being tested
_fme_validation_message_list{} – a list attribute with all the failed test messages.

In this exampl,e one of the features fails three tests, so the failed messages that are added to the feature are:

Error Attribute	Error Message
_fme_validation_message	Attribute 'CodePrefix' with value 'ABE' fails check for Matches Regular Expression '[ABCD]{3}'
_fme_validation_message_list{0}	Attribute 'CodePrefix' with value 'ABE' fails check for Matches Regular Expression '[ABCD]{3}'
_fme_validation_message_list{1}	Attribute 'num_measures' with value '12' fails check for in Range '[0,10]'
_fme_validation_message_list{1}	Attribute 'CodePrefix' with value 'ABE' fails check for in 'ABC,ABD,TXU,TXV'

Once you have configured the AttributeValidator, you can configure the workspace to generate a validation report or statistics. Referring to the bookmarks in the workspace:

Data Validation Report: The transformers in this bookmark create a list of all the error messages and write to a CSV file. If there are multiple error messages, as above, then ListExploder will split the feature into one feature for each message (creating three records for the error illustrated above).

Validation Statistics: This bookmark summarizes the errors using the StatisticsCalculator. ISO19000 standards discuss data quality for a dataset. This bookmark illustrates how you might start to configure FME to conform to ISO19114, where the standard discusses how the data quality test might be used to pass or fail a dataset based on different data quality measures, such as:

Boolean: any error causes a failure of the dataset
Number of commissions:a specific number of test failures causes the dataset to fail
% of commissions: percentage of test failures causes the dataset to fail

In this example ,we’re calculating the percentage failure.

AttributeValidator can be paired with the GeometryValidator to give complete validation of your spatial and non-spatial data. You can use the AttributeValidator in conjunction with the FeatureWriter. If the data fails validation, you have the opportunity to roll back the data load if the dataset fails to meet the set pass criteria. Typically, you’d have one AttributeValidator per feature type in your workspace.

Results

Run the workspace and inspect the results. The output from the Failed port of the AttributeValidator is shown above. The output from the workspace is two CSV files:

Detailed failure messages (some of the columns have been hidden)

Summary Statistics

Using the FME Excel writer would be a good option for creating a more comprehensive data quality report.

AttributeValidator Limitations

Like the GeometryValidator, the AttributeValidator validates one feature at a time and validates the attributes on each feature. AttributeValidator does not check for relationships between features.

Currently, AttributeValidator does not validate date fields. You can validate dates using the DateTimeConverter transformer.

Example 2

The second example workspace, AttributeValidatorExample_pivot.fmw, introduces an alternative report format. This workspace pivots the error report so that each feature has a summary of all the attribute errors (some columns have been hidden for clarity):

Search