Files
-
- 100 KB
- Download
Introduction
In a conventional workflow, FME reads the data and writes using the writer feature types. This workflow has a limited scope for additional processing after all the data has been written, using a Python or TCL shutdown script. The FeatureWriter expands workflow possibilities by allowing writing data mid-workflow, followed by additional data translation and transformation. This is useful when it is desirable to perform an operation on the data after it is written.
Here are a few possibilities for the FeatureWriter, all accomplished within one workspace:
- Simple procedures previously accomplished with scripts or manually:
- Copy or move data after writing
- FTP upload after writing
- Email after the job is complete
- Load the file into S3 or other Cloud storage after writing
- Complex tasks that required chained workspaces and FMEFlowJobSubmitters before:
- Quality Check > Quality Report with Notification > Database Insert
- Notifications to FME Flow
- After writing data with the FeatureWriter, it is easier to prepare notification messages, such as an email for FME Flow, right in the same workspace
- Integrations with third-party tools for data transformation in FME without waiting for a new reader:
- Examples are LASTOOLS or Orfeo Toolbox, ImageMagick
- Basic workflow is FeatureWriter - SystemCaller - FeatureReader - Cleanup
This basic FeatureWriter demo validates a dataset and emails a validation report to the data validation manager after all the features have been written.
Video
Source Data
Parks.tab data is read with the MapInfo TAB reader. The dataset contains a number of attributes related to information about the parks, such as park name, whether the park has washrooms, a dog park, or other special features.
Step-by-step Instructions
1. Validate Park Attribute Values
Validate the Parks dataset against a number of tests with the AttributeValidator. A list of the tests that failed is added to the features routed through the Failed port. Later, this list allows us to write data on the parks that failed one or more tests to an Excel spreadsheet, enabling data validation and quality assurance.
2. Extract Error Messages and Validation Tests of Failed Parks
The list from the AttributeValidator, _fme_validation_message_list{}, is exploded into individual list items, so that a Park feature is obtained for each test failed. For example, there is a park in the Mount Pleasant neighborhood where the ParkName, SpecialFeatures, and Washrooms attributes are missing values, so three features are output for this park, each one containing a different value for _fme_validation_message_list{}.
The StringSearcher creates an attribute, _first_match, which contains the rule and its configuration. In this case, the values are Type is 'STRING', in 'Y, N' or Maximum Length =20. This attribute is used to fan out features in the FeatureWriter.
3. Write Failed Parks into a Spreadsheet
FeatureWriter writes failed Park features into an Excel spreadsheet, FailedParks.xlsx, with a separate Excel tab for each _first_match value. The Summary port outputs a list summarizing the features written, including the total number of features, the name and number of each feature type, and the output dataset path. The FeatureWriter allows us to write data mid-transformation and then continue with additional processing and tasks.
4. Email the Failed Parks Report
Email the failed features report, FailedParks.xlsx, to the Data Validation Manager using the Emailer transformer. The Emailer transformer must be configured to use your from and to email addresses.
- Attachments: path to FailedParks.xlsx error report spreadsheet
- Configuration for the SMTP Connection section if Gmail is used
- Sender Authentication: If you have a Gmail/Google account with two-step verification, you will need to generate an App-specific password for the Emailer. If you don’t have 2-step authentication, you may allow less secure apps to access your Gmail account
- Once the email is received, the data validation manager can perform data validation and quality assurance on the parks dataset
Result
Data Attribution
The data used here originates from open data made available by the City of Vancouver, British Columbia. It contains information licensed under the Open Government License - Vancouver.