FME Version
Introduction
Sometimes in a dynamic workflow, the desired schema only exists within the workspace (e.g. after pivoting data). Building a schema feature works, but often the schema needs to be carefully constructed with attribute transformers, like in the last example. There must be a better way!
In this example, a custom transformer called the SchemaScanner will be used to build a schema feature, then the feature will be manipulated using other transformers. This is a common pattern and should be used whenever you need to build a schema and edit its contents.
As of FME 2022, SchemaSetter is replaced by SchemaScanner.
The Pattern
SchemaScanner > ListExploder > (Manipulate the Data) > ListBuilder
Use the SchemaScanner to create a list{} feature in FME using the current feature’s schema. Next, use the ListExploder to split the list into its individual features, where each feature contains an attribute name from the list. The third step is to manipulate the schema as needed (i.e. sort). Finally, bring everything back together into a new version of the schema feature using the ListBuilder.
Here are a few examples of why you would want to do this:
- Renaming Attribute Names
- Sorting (e.g. Dates)
- Removing or Adding New Attributes
The Workspace
In the attached workspace, we are creating a pivot table using a dataset of sales figures. The current data looks like:
Quarter | Sales | Region |
2016Q1 | 1936 | New England |
2016Q2 | 167 | New England |
2016Q1 | 1235 | Mid-Atlantic |
2016Q2 | 169 | Mid-Atlantic |
The output we want is the following:
Region | 2016Q1 | 2016Q2 |
New England | 1936 | 167 |
Mid-Atlantic | 1235 | 169 |
This is a simple pivot around the Region field and also note that we are not doing any grand totals of the data. How can we do this with FME?
Step-by-Step Instructions
Use an AttributeCreator to build attribute names out of the values of ‘Quarter’. The values of the new attribute will contain the Sales data.
Next, use the Aggregator to summarize each Region.
Settings:
Please set the Aggregator Group By to Region
After this step, you have pivoted your data and the actual data looks like:
2016Q1 | 2016Q2 | Region |
1936 | 167 | New England |
1235 | 169 | Mid-Atlantic |
You'll see something like this in the Feature Information window:
(Note: to see the data in Table View you'd have to add an AttributeExposer to add the new attributes to the workbench schema)
You should notice that this data is structured in a completely new way as it was created directly inside the workspace. This is important! Dynamic workspaces always need to be able to reference a schema definition in order to work and in this case we do not have one defined just yet (Step 3 below will address this).
As we only want to manipulate one schema feature (instead of doing this for every feature). We need to split the data up into two streams. The first stream should connect to a brand new FeatureMerger in the Requestor Port while the second stream should connect to a Sampler. Use the sampler to pass one single feature into the next step. See below:
At this stage, we need to build the Schema Feature and then manipulate it. We will start with the SchemaScanner transformer:
Note: Filter out any extra FME attributes using the Ignore Attributes Containing parameter
If you were to inspect after using the SchemaScanner, you would see the attribute list stored is now stored on the feature.
At this stage the Schema object has been created but isn’t organized in the sort order that you need. Let’s fix that by using a ListExploder:
The Schema has been divided into each individual attribute and stored inside an FME Feature. Time to manipulate each attribute. In the attached example we used a Sorter transformer but since we are talking about a pattern, this is where your particular needs are addressed. Modify the name, fme_data_type and list order as you see fit.
The final step is to bring the data back together into a list. Use the ListBuilder to accomplish this.
You now have a completed Schema Feature that has been designed and edited. Fantastic!
4. Merge the Data and Write Dynamically
Connect the output from step 3 above to the Supplier port on the FeatureMerger. Do a 1:1 match to ensure the Schema is present on each feature. This ensures that the writer will receive the schema on the first feature it encounters.
Finally, ensure the writer is setup in dynamic mode and the source is your schema feature.
When you run the workspace, you will get your desired output with your new schema defined.
Final Workspace:
Notes
Avoiding Dropped Features - FeatureReader:
If using the FeatureReader to create a schema feature, please note that the transformer adds a special attribute and value:
fme_schema_handling = ‘schema_only’
This attribute tells the writer to use that feature as a schema and then remove it from the output. This can be an issue if you do any merging of data and schema in the workspace as the data merged with this schema feature would be dropped when written. For example, if you merge a schema feature so that it gets added to all the features in the workspace.
To ensure this doesn’t occur, remove the fme_schema_handling attribute entirely so that the first merged data feature is not dropped by the writer.
Comments
0 comments
Please sign in to leave a comment.