FME Version
Files
Introduction
In this tutorial, you’ll learn how to split an FME Flow (formerly FME Server) automation workflow into separate jobs to be processed by multiple engines in parallel. This capability offers performance gains for large jobs that do not need to run in series, such as updating several feature types in a database, and can increase fault tolerance where other jobs continue even if one job fails.
FME automations feature Merge Actions and Split-Merge Blocks that wait for upstream parallel jobs to complete before proceeding. Provided more than one engine is available, processing multiple jobs at once increases the speed of the automation, rather than processing all jobs in a series using one engine. This tutorial will coordinate running five workspaces with different processing times in parallel to produce a (fictional) final data product. The sample workspaces use the Decelerator transformer to mimic how an automation handles different job timing in a production workflow.
Merge Action
Merge Actions make an automation wait for all upstream jobs triggered by the same event to be complete before continuing. Compared to the split-merge block, which is limited to only workspace actions, the merge action is advantageous in allowing all types of FME Flow actions to occur prior to merging the workflow.
Split-Merge Block
Split-Merge Blocks make the automation wait until all jobs within the block are complete before continuing with the rest of the automation. When complete, a single, unified message (per input) is returned from the split-merge block. The main advantages of the split-merge block are that it can merge Automation Writer outputs and wait for all triggered jobs within the block to finish before continuing downstream actions. In contrast, the Merge Action cannot be connected to multiple triggers or workflows that use Automation Writer messages.
For example, a split-merge block could be used to process statistics for each city block, in each county, of a state. You only want to process one summary report per county, but there are 60 counties containing 15 blocks, each representing a single feature. If a workspace runs for every feature, the county-processing workspace (ProcessCounty.fmw) will run 60 times and the block-processing workspace (ProcessBlock.fmw) will run 900 times!
With a split-merge block, all 900 block-level jobs will run before outputting a merged message for each county. This reduces the output messages to 60 (one for each county), and downstream workspaces can process the reports according to this message data. The SplitMergeBlock-Example.fsproject (built with FME Flow 2023.0) found in the Files section shows this scenario in action.
This workflow is impossible with the Merge Action because it uses Automation Writers. The split-merge block opens new doors for handling Automation Writer outputs, allows control over the granularity of message merging for downstream processes, and can improve the efficiency of your workflows when combined with Queue Control. For an in-depth introduction to the split-merge block and its capabilities, please read our article on Getting Started with the Split-Merge Block.
Step-by-Step Instructions
1. Download and Import FME Flow Project
From the Files section above, download the linked FME Flow project AutomationsJobOrchestration-Begin.fsproject (built with FME Flow 2023.0) and import it to FME Flow by clicking Projects > Manage Projects, then Import.
2. Create a New Automation
From the FME Flow web interface, go to Automations > Create Automation to create a new Automation.
3. Configure the Trigger
On the Automations canvas, double-click the Trigger node. For this exercise, configure an FME Flow Schedule (initiated) Trigger on a daily interval. Once the automation is running, we’ll be able to test it manually whenever we want. Alternatively, a Manual Trigger can be used.
4. Add an Action
Configure the next Action as a Run Workspace. The actual function (or lack thereof!) of these workspaces isn’t important - imagine that this is the beginning of a nightly data update into a database. From the repository created when importing the server project (Automations Exercises), add SpeedyDataUpdate.fmw. There are no published parameters to configure for this workspace.
5. Add a Parallel Action
Click on the plus sign (+) in the bottom left corner to open the menu. Click Action to select it. Now click anywhere on the automation canvas to place it. Click the output port of the Trigger and drag a connection to the input port of the new Action. Configure this new Action as a Run Workspace. This time, add LongerDataUpdate.fmw. There are no published parameters to configure for this workspace.
6. Chain a Third Action
Add a third Action component and configure it as a Run Workspace downstream of the success port of LongerDataUpdate.fmw. Run PostProcessing_LDU.fmw here.
7. Add a Merge Action
Downstream of all three Run Workspace Actions, place a Merge Action. Connect it to the success ports of both branches of your Automation.
Alternatively, the workflow can incorporate the split-merge block . The split-merge block is designed for running collections of workspaces and waiting for all jobs to complete before carrying on with downstream actions.
8. Add Two Final Run Workspace Actions
Downstream of the Merge, configure a Run Workspace Action to run Validation.fmw. Now that all the planned updates to the database are prepared, it is time for a validation routine before the changes are reconciled and posted. Downstream of the success port of the Action that runs the data validation workspace, configure a Run Workspace to run MakeDataProduct.fmw. Imagine that you’re now set up to generate a nightly report that will be ready in the morning with the previous day’s work incorporated!
9. Add an External Action
Downstream of the failure port of the Validation.fmw Action, configure a Send an Email External Action (or another external notification of your choice) to alert you if the night’s data upload has failed validation.
As in the article Run a Workspace in Response to Incoming Email, use Load Template or manually enter your email server information. If you are using an SMTP server that requires authentication (likely with popular email providers), you’ll need to enter values in the SMTP Account (optional) and Password (optional) fields. Input an Email To address you can check, and add an Email From (the same as your account email address) address.
In the Email Attachment field, use the down arrow to select Workspace > Job Log.
In the Email Body field, select General > Event > Event as JSON, or choose Text Editor to compose a friendly message that will automatically populate with details using JSON keys from the automation.
To get an email with the job log of any other failed jobs, connect the failure ports of those jobs to the Email Action as well (or add another Email Action). Since the workspaces included in this exercise will not fail jobs, test that emails will be sent by also connecting the Email Action to the success port of your validation job.
10. Save and Start the Automation
Save the automation and then click Start Automation in the upper right corner.
If you configured the trigger as a schedule and left Start Immediately checked when configuring it, your automation should trigger right away. If not, double-click the schedule trigger action and then select Trigger in the FME Flow Schedule Details pane. If you used a Manual Trigger, select Trigger next to Start Automation in the upper-right corner.
11. View Logs and Verify Order of Events
Wait a minute for the jobs to finish, then check Menu > View Triggered Jobs. You should see the five workspaces from your automation listed. To see the job log from any of the workspace jobs submitted by the automation, click on the workspace name here.
The job logs will tell you what happened while a particular workspace was run. To see the log of the automation as a whole, return to the automation and go to Menu > View Log File.
Furthermore, you can search or filter the Automation log with the tools above the log entries. For now, click the clock button to show timestamps for each log entry. After the single-Action branch finishes its job, you should see a log entry, “(Automations) Received 1 of 2 notifications for merge action.” From the time stamps, you’ll see that the second chained job is not sent to an Engine to be processed until after the upstream job is complete, and that once the third job completes, there is another entry, “(Automations) Received 2 of 2 notifications for merge action.”, after which any Actions after the Merge Action are processed.
Additional Resources
Getting Started with the Split-Merge BlockGetting Started with Enterprise Integration Patterns
[Documentation] Combining Messages from Multiple Workspace Actions
[Webinar] Automation Keys: What They Are and Why You Should Use Them (@27:15 for a demonstration using the split-merge block)
Comments
0 comments
Please sign in to leave a comment.