Validate your Data's Attributes with the AttributeValidator Transformer

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

Introduction

Attribute validation is the cornerstone of quality data . Any software can act as an amplifier. In the case of FME, if you distribute poor quality data to a wide range of users or formats then you’ve amplified the poor data quality issue. If you validate your data before you load it into your data repository, then you amplify the benefits of a single-source-of-truth.

FME has always been able to validate your data attributes using transformers such as the Tester, AttirbuteCreator (with conditional values), Joiner (to use validation lookup tables) etc. But this is a little tedious and ad hoc. The AttributeValidator, that brings under one umbrella many attribute validate tasks, just as the AttributeManager consolidates many of the attribute processing tasks.

AttributeValidator might be used in conjunction with the GeometryValidator to ensure all your data conforms to your target data model before loading – reducing the number of features that might get rejected because of data quality issues.

The role of the AttributeValidator is to ensure your attribute data will load into the target format data model.

Validation rules handled by AttributeValidator include:

  • Attribute type (integer, float, char, xml, json, etc.)
  • In – either a list or range – good for domain validation
  • Regular Expressions
  • Unique
  • Not Null
  • …and more…

Check the AttributeValidator user documentation for a full list of the validation operators

 

Video

 

Source Data

We’ll work with some cell phone signal data and validate it for loading into a simple database and generate a validation report as a CSV file.

Here’s the source data:

sourcedata.jpg

With the AttributeValidator we’ll validate the following attributes:

  • CodePrefix: ensure values match a domain list
  • StationID: ensure StationID unique and an integer
  • Quality: ensure the values are in a specified range
  • Power: ensure the values are in a specified range
  • Num_measures: ensure the values are in a specified range & is an integer
  • CodeValue: ensure the values are integers
  • JSON: check that the attribute contains a valid JSON string

 

Step-by-step Instructions

Example 1

Open the attached FME Workspace Template. It reads from a CSV file and writes out two validation reports. The workspace is complete and annotated.

attributevalidationworkspace.jpg

The following describes some of the key aspects of the workspace:

AttributeValidator: Obviously the key part of this workspace. The validation tests have been configured as shown in the AttributeValidator annotation and are shown below:

attributevalidatortransformer.jpg

Most of the tests are self-explanatory.

  • CodePrefix: We’ve created two tests that more or less do the same thing, just for illustrative purposes. The first using the IN operator (ensure the CodeName is in the list ABC, ABD, TXU,TXV) and the Regular Expressions (ensure that the CodePrefix only has the 3 characters which can be ABC or D)
  • Quality, Power: range test. A range assignment dialog helps set-up the ranges:

 

rangevalidation.jpg

The syntax is ‘[‘ means inclusive (greater than or equal to) , ‘(‘ means exclusive (greater than) . So [0,10) means “greater than or equal to 0 and less than 10”. The same sytax can be used with the IN operator to set a range.

  • StationID: check that the value is unique.

The AttributeValidator validates against all tests, so in this example num_measures must validate as both an integer and in the range [0,10].

AttributeValidator Output

If all attributes on the feature pass their validation tests then the feature is output via the Passed port. If any test fails, then the feature is output via the Failed port.

AttributeValidator adds two attributes to the feature if a test fails:

  • _fme_validation_message - the first failed test message for that the attribute being tested
  • _fme_validation_message_list{} – a list attribute with all the failed test message.

In this example one of the features fails three tests, so the failed messages that are added to the feature are:

Error Attribute Error Message

_fme_validation_message

Attribute 'CodePrefix' with value 'ABE' fails check for Matches Regular Expression '[ABCD]{3}'

_fme_validation_message_list{0}

Attribute 'CodePrefix' with value 'ABE' fails check for Matches Regular Expression '[ABCD]{3}'

_fme_validation_message_list{1}

Attribute 'num_measures' with value '12' fails check for in Range '[0,10]'

_fme_validation_message_list{1}

Attribute 'CodePrefix' with value 'ABE' fails check for in 'ABC,ABD,TXU,TXV'

Once you have configured the AttributeValidator you can configure the workspace to generate a validation report or statistics. Refering to the bookmarks in the workspace:

Data Validation Report: The transformers in this bookmark create a list all the error messages and writes to a CSV file. If there are multiple error messages, as above, then ListExploder will split the feature into one feature for each message (creating three records for the error illustrated above).

Validation Statistics: This bookmark summarizes the errors using the StatisticsCalculator. ISO19000 standards discuss data quality for a dataset. This bookmark illustrates how you might start to configure FME to conform to ISO19114 where the standard discusses how the data quality test might be used to pass or fail a dataset based on different data quality measures such as:

  • Boolean: any error causes a failure of the dataset
  • Number of commissions: specific number of test failures causes the dataset to fail
  • % of commissions : percentage of test failures causes the dataset to fail

In this example we’re calculating the percentage failure.

AttributeValidator can be paired with the GeometryValidator to give complete validation of your spatial and non-spatial data. You can use the AttirbuteValidator in conjunction with the FeatureWriter. If the data fails validation, you have the opportunity to roll back the data load if the dataset fails the pass criteria you have set. Typically you’d have one AttributeValidator per feature type in your workspace.

 

Results

Run the workspace and inspect the results. The output from the Failed port of the AttributeValidator is shown above. The output from the workspace is two CSV files:

  • Detailed failure messages (some of the columns have been hidden)

detailedreport.jpg

 

  • Summary Statistics

summaryreport.jpg

Using the FME Excel writer would be a good option for creating a more comprehensive data quality report.

 

AttributeValidator Limitations

Like the GeometryValidator, the AttributeValidator validates one feature at a time and validates the attributes on each feature. AttributeValidator does not check for relationships between features.

At the moment, AttributeValidator does not validate date fields. You can validate dates using the DataTimeConverter transformer.

 

Example 2

The second example workspace - AttributeValidatorExample_pivot.fmw - introduces an alternative report format. This workspace pivots the error report so that each feature has a summary of all the attribute errors (some columns have been hidden for clarity):

pivotreport.jpg

 

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.