Getting Started with the Split-Merge Block

Files

SplitMergeBlock-Examples.fsproject
- 700 KB
- Download

Introduction

Automations on FME Flow (Formerly FME Server) provide a wide range of options when building enterprise integrations to solve your data challenges. The capabilities of automations can be stretched far beyond basic trigger-action workflows. You can take your workflows to the next level with the Automation Write to create data-driven workflows that scale up jobs based on incoming features rather than events. We’ve expanded the power of automations by introducing the Split-Merge Block to further support parallel job processing and improve the functionality of data-driven workflows.

Prerequisites

Please note that this is an advanced Automation authoring article. It’s imperative to have a working knowledge of the Automation Writer and output keys. For a quick introduction, please read the Automation Writer documentation, Output Attributes documentation, and Building Integrations with the FME Flow Automation Writer.

What Are Data-Driven Automations?

Before we start exploring the split-merge block, we first need to understand data-driven automations. In a simple automation, an event being triggered or an action succeeding/failing will run the next action once. With the Automation writer, you can output features from a workspace, run downstream actions on a per-feature basis, and dynamically fill in parameter values based on each features’ attribute values. Routing N features to an action will scale up the number of times the workspace/external action is run to N times. However, in many scenarios, actions further downstream only need to be run once, such as sending a single email notification when an automation is complete. How do you deal with this challenge?

Reg vs AW Automation.png

Workflows that incorporate the Automation writer need a way to downscale or merge the jobs so that downstream actions aren’t run excessively. The Merge Action is meant for this purpose, but it’s only compatible with event messages (Success/Failure) and not Automation Writer messages. This is where the split-merge block comes into play.

What Is The Split-Merge Block?

The split-merge block looks similar to a bookmark, where it encircles a set of workspace actions with one input port for incoming messages and two outgoing ports for successes/failures. You can add one to the canvas by opening the component menu and selecting the Split-Merge Block icon.

To solve the challenge in the previous section, the block treats contained workspaces as a single event and outputs one message. This message will only include information on the last job run inside the block. The split-merge block is primarily used with workflows that incorporate the Automation writer because a basic chain of workspaces won’t scale up to more than one job per workspace.

SMB Solution.png

It can be difficult to visualize the number of messages passing through an automation, so annotations have been added to screenshots in this article to illustrate the number of messages being passed between components. The color of the annotations indicates how different messages and components are controlling the number of times other actions are run.

Why Should You Use the Split-Merge Block?

The main advantages of the Split-Merge Block include:

Running a group of jobs in parallel across multiple FME Flow Engines.
Treating the success/failure of a group of jobs as a single event.
Downscaling event messages in data-driven automations to run downstream actions without excessive work.

Many users tend to create one large workspace to cover every step in the translation, this is advantageous because:

It’s easier to visualize the complete workflow.
Feature caching makes it easy to build.

However, there are major disadvantages:

It’s difficult to hand over the workspace to other users unless it is well documented.
One large workspace underutilizes hardware resources.

By breaking down workspaces into smaller, distributable, and manageable pieces with the Automation Writer and Split-Merge Block, you can address all these challenges and add downstream functionality in your workflows. Now let’s walk through some use cases with the Split-Merge Block.

Basic Use Cases

Run a Chain of Jobs in Parallel

A GIS Analyst wants to quickly process large groups of spatially related data across multiple FME Flow engines and automatically send a single email to themselves once all of the jobs are finished. The single state has 5 counties and 10 blocks per county, which scales up Process Block to 50 jobs. After these jobs run successfully, Downstream Work will run once and an email will be sent.

Note: the split-merge block only releases a failure message if the job originates from a workspace action that’s connected to the block’s failure port. In this case, the split-merge block will only return a failure if at least one job fails for the Process Block workflow.

A different configuration would be receiving an email notification when each county is processed. This will run five groups of jobs for each county from Process State.

Run Parallel Workspaces

An IT department wants to automatically update internal databases and systems with invoices from their finance department. When everything is done, they want to send a REST API request to an internal application. An incoming invoice file is split into individual items and triggers two parallel processes: one to handle individual invoice items and another to update the databases. These jobs run faster in parallel across multiple engines. The split-merge block downscales the work into a single output message so that only one REST API request is made in the end.

Run a Dynamic Set of Workspaces in Parallel

A quality analyst wants to run a set of validation workspaces against a new dataset and generate a single report when all the tests run successfully. The Testsuite Control workspace reads a test suite spreadsheet to see which workspaces need to be run. It outputs a set of features from the Automation writer to dynamically tell the Dynamic Workspace action action which workspaces to run. Whenever the test suite spreadsheet is edited, FME Flow authors don’t need to manually add/remove test suite workspaces from the automation. If any of the tests fail, an email is sent to the QA team.

Tip: For a guide on Dynamic Workspaces, please read our article on Dynamic Workspaces: Data Driven Parallel Processing.

Advanced Functionality

Connect to Actions Outside the Split-Merge Block

It’s possible to send event and Automation writer messages out of a split-merge block, but you cannot send messages into a block that originated from outside. Outer connections can be done instead of, or in addition to, the success and failure ports on the block itself.

Outer Action.png

Tip: automations don’t have feature caching, so make use of the “Log a Message" action. This will help with troubleshooting and seeing exactly what message content is being passed between workspaces.

Nested Split-Merge Blocks

Any number of blocks can be nested within each other. This is particularly useful when you want to gradually scale down jobs or external actions. For example, consider the scenario below where a nested block is required to make the Downstream Work workspace run five times while only sending one email at the end.

Note: nested split-merge blocks are only supported in FME Flow 2022.0 and newer. To receive a single email if Process Block fails, a workspace action is required between the two blocks' failure ports since they cannot connect directly to each other.

Nested SMB Corrected.png

Chained Split-Merge Blocks

Automations can work with chains of split-merge block. However, you need an action or external action connecting the two because the output of one split-merge block cannot directly connect to the input of another. Chained split-merge block are useful if you have multiple automations to run in sequence and you wish to perform an action in response to the completion of the entire workflow.

Tip: If you need to chain two split-merge blocks without running a workspace or external action, use the log action.

Chained SMB.png

Parallel Split-Merge Blocks

It is also possible to run multiple split-merge block in parallel. The next downstream component should be a merge action. This way the automation will wait for all blocks to finish before processing the rest of the workflow.

Parallel SMB.png

Conclusion

For a detailed dive into the split-merge block, please read the next article called Understanding the Split-Merge Block | Troubleshooting & FAQs to learn about workflow design considerations, output port behavior, and troubleshooting tips.

Additional Resources

Getting Started with Automations

Building Integrations with the FME Server Automation Writer

Job Orchestration with Automations

Getting Started with Enterprise Integration Patterns