FME Flow Administration: Job Scalability and Management

Introduction

Optimizing FME Flow is an iterative process and requires continuously analyzing your existing setup. Some of the deciding factors for system improvement are

Job processing time
System resource usage
Number of engines available
Business requirements

This document provides various techniques to improve performance on FME Flow.

Performance

In this section, we will be looking at different avenues to improve FME Flow performance.

Workspace Authoring for Performance Tuning

While authoring and designing your workspaces on FME Form, there are various pointers that you can keep in mind to help improve job performance. Since the underlying technology processing the workspace is the same for FME Form and FME Flow, any workspace improvement made on FME Form will also translate to Flow. We have an article that focuses on Performance Tuning during Workspace Authoring.

Parallelism

Often we deploy a single workspace to FME Flow that carries out too many separate tasks. These workspaces can over (or under) utilize resources and cause troubleshooting headaches, along with a variety of other performance and organizational challenges. Parallel processing in FME could be the answer.

Parallelism in FME Flow uses Automations to distribute processes across engines by creating a workflow design with multiple jobs. This also provides control over which engine and when jobs get submitted using queue control.

Typical workspaces process every single feature one by one in classic pipeline processing, as shown in the first workflow.
The workflow runs for as long as it takes to process every feature in sequence. On the Flow, one single engine will be responsible for this job.

However, in parallel processing, we split up our data in jobs across all our engines, and they work in parallel to get the results much faster. Runtime is only as long as the longest task, as shown in the second workflow.

For more details about parallelism in FME Flow, please see the Level Up With Parallelism in FME Flow presentation and the Job Orchestration with Automations article. These resources use advanced automation tools like the Automations Writer and Split-Merge Block that can optimize large and small processing tasks.

Adding Engines to an Existing Engine Machine

You can scale FME Flow to support a higher volume of jobs by adding FME Engines on the same machine as FME Flow Core.

The FME Flow Core contains a queueing system that distributes jobs to the FME Engines. Each FME Engine can process one job at any one time. So if you have ten engines, you can run ten jobs simultaneously. Consider adding engines if you have many simultaneous job requests, with jobs consistently in the queue.

Please note that adding engines to the same machine does not reduce the time it takes to run a single translation. This time depends on the underlying server hardware and the design of the workspace. Complex workspaces, big data manipulation, and workspaces with large datasets take more time to run.

Having multiple engines on the same machine also helps with Job Recovery. If your license allows, you can increase the number of engines on a particular host through the Web Interface. An FME Flow-Hosted machine even allows unlimited engines. However, the hardware must be suitable to handle the increase in engines. As a general rule of thumb, one CPU core per FME Engine is optimal.

Job Queues

Job queues are a mechanism for sending specific jobs to specific FME Engines. The reasons for using job queues include:

Controlling the priority of job requests.
Sending a job to an FME Engine in close proximity to a data source.
Reserving FME Engines for scheduled tasks.
Reserving CPU-Usage Engines for specific jobs.
Reserving some FME Engines for quick jobs, and others for high-load jobs.
Sending a job to an FME Engine that supports a particular format.

In FME Flow 2021.0 and newer, the job queueing mechanism was expanded to provide much greater control and flexibility in how FME Flow processes jobs and uses the available engines. Now, instead of simply creating queues and assigning engines directly to those queues, you can create engine assignment and job routing rules to dynamically assign engines to queues and send jobs to specific queues based on criteria you set.

Terminology

In FME Flow, Queue Control is made up of the following three components that work together to ensure jobs are routed to the correct engines for processing:

Job Routing Rules: These rules determine which queue any submitted job should be sent to. Two types of Job Routing Rules can be configured -
- Properties: rule(s) based on one of these properties or a combination of these job-specific properties- processing time, system CPU usage, peak memory usage, job repository, related workspace, source type, source name, user name, or user role.
- Repository: rule(s) based on the repository that a workspace is stored in.

Engine Assignment Rules: These rules control which engine(s) are assigned to each queue. Two types of Engine Assignment Rules can be configured-
- Property: Properties fall under three categories; Custom, Engine, or Queue. Custom properties are user-defined values assigned to engines commonly used in Docker/K8S deployments. Engine properties are pre-populated based on the specifications of the engine and attributes of the engine host; these include engine type, operating system, hostname, physical memory or processor count. Queue properties hold values about the jobs already waiting in that queue, for example, the number of queued jobs and the time for all queued jobs to finish.
- Name: Rule(s) based on the name of the engine.

Queues: Queues hold jobs sent to them by job routing rules and direct them to engines based on engine assignment rules. Every queue is set with a priority ranging from 1 to 10, where 1 is the highest priority.. As of FME Flow 2023.0 queue priority is set with a priority ranging 1 to 10, where 10 is the highest priority. During installation, a “Default” queue is auto-created with a priority of 5. Being a middle value, it provides the option to create queues with either higher or lower priority than the default.

FME Flow Job Processing without Queue Control

By default, no engine assignment rules or job routing rules are created/ set up by the system. All jobs to be processed will be routed to the Default queue with a job priority 5. The jobs are considered equal and will be run by any available engine in the order in which they were submitted.

See the documentation for more information about Queue Control, managing queues/rules, and reference examples. For more examples, see the Getting Started with Queue Control article series.

Advanced Tips and Tricks

When importing from older versions of FME Flow using the Backup and Restore command, any historic priority will be restored. You should review the adjusted priority upon restoring a backup. If a Queue with the same priority does not exist when a new job runs, it will be automatically created and added to the Engines.

Explicitly defining a queue, such as on the Run Workspace page, a Run a Workspace action, in Schedules, or through Job Directives, effectively bypasses job routing rules.

Adding FME Engines on a Separate Machine

You can add processing capacity to your FME Flow by installing additional FME Engines on a separate computer from the FME Flow Core.

When adding FME Engines, please consider the host’s CPU and memory resources, which constrain the maximum concurrent request throughput.

The additional FME Engines can be installed on any supported operating system (Windows or Linux). They do not have to match the specifications of the FME Flow Core. However, it is important to note that adding FME Engines that do not match the primary release version of the FME Flow Core is not supported.

We recommend installing all FME Engines on systems that are synchronized to the same time zone as other FME Engines and the FME Flow Core. If time zones differ, unexpected issues may arise, including difficulty accessing the FME Flow Web User Interface, improper timing of FME Flow Schedule triggers, and inconsistent or misleading timestamps in log files (accessed from Resources).

Advanced Tips and Tricks

In a fault-tolerance environment, we recommend assigning unique names to FME Engines. However, if multiple FME Engine hosts have the same FME Engine name, the queue server configuration applies to all FME Engines with the same name in the same way, regardless of which host it resides on.

See the documentation for more information on how to implement this.

Keep your FME Engines Close to the Data

One of the main reasons to add engines on a separate machine is to have an FME engine closer to the data. For example, if data is located on a server in a remote office, it makes sense to add a local engine to the server to process the data. This avoids long-distance transfers and network latency.

Another reason to add separate engines is to gain access to 3rd party formats that may not be installed on the FME Flow Core system.

CPU-Usage Engines

FME Flow supports two different engine types, Standard and CPU-Usage engines. These two engines are identical from a technical standpoint. The difference lies in their pricing model. This is an important consideration when you are scaling your deployment to ensure you only pay for what is needed.

Standard Engines follows the traditional pricing model, whereby you purchase a fixed engine that is permanently available to process jobs. If you are exceeding capacity on a regularly occurring interval, this is likely the route to take as your overall FME Flow demand has likely increased.

However, CPU-Usage Engines follow a credit pricing model, which makes them ideal for varied and unpredictable workloads. You can spin up as many engines as you’d like for a duration of time and only pay for the CPU time they spend processing jobs. CPU-Usage Engines run on credits, and one credit equals one hour of engine CPU time. From there, you only pay for the CPU time the engine spends running a job or put another way, whilst the Engines are idle, you are not paying for anything! Please follow this article for Getting Started with CPU-Usage Engines.

FME Core and Web Server

In most cases, FME Engines are the limiting factor for performance. The FME Flow core and database have no issues with a high number of requests. We do have technical specifications available online that should be consulted during server provisioning.

Additionally, FME Flow's Web Application Server can process in excess of 100,000 HTTP requests per hour. In environments where an extremely large number of requests are expected, it is recommended to keep things simple by using the single Web Application Server, as the FME Engines will remain the bottleneck. We believe there is no added benefit for additional FME Flow Web Applications for performance, though you could have an additional Core and Web Application for fault tolerance purposes.

Workspace Versioning

Version control allows you to access previous versions of your published workspaces and related files.

Optionally, when you configure version control with a remote Git repository, you can access previous versions of files from all members of your team who commit to the same repository. Version control is not configured with Github by default. Instead, all commits are stored in a repository on the local FME Flow system. You do not need to configure a remote GitHub repository to use Version Control.

Please note that version control does not, by itself, enable you to update your local working copy of repositories' files. Instead, version control allows you to download previous versions. Once downloaded, you can update your working copy by republishing it to FME Flow.

You can enable Version Control in FME Flow under the “System Configurations” section on the Flow web UI. Once you enable version control, any time you upload (from FME Flow) or publish/republish (from FME Form to FME Flow) a new or existing file, you have the option to commit a version of the file to your local system. This option is provided on the Commit dialog when you upload a workspace directly from FME Flow, and from the Publish... dialog of the Publish to FME Flow wizard in FME Form.

If you perform any type of Backup & Restore operation of your FME Flow configuration, such as when upgrading your installation, the restored FME Flow does not maintain version history. However, if you push your file versions to a remote Git repository on GitHub, you maintain backups of them outside of FME Flow.

Using FME Form to Version

When publishing a workspace to FME Flow with Version Control enabled, the user will see a “Commit” button on the publishing wizard dialog.

Using FME Workbench to Version.png

Using FME Flow to Version

It is also possible to create a version of an existing workspace that has been published to FME Flow. This is good for those times when a workspace exists and has been tested on the FME Flow Environment, and a user wants to create a version of that workspace.

Using FME Server to Version.png

Using FME Flow with a Remote Git Repository

For more information on using FME Flow with a remote repository, please review the FME Flow Admin Version Control documentation.

What's Next?

As an administrator, learn how to configure FME Flow to improve performance, enhance security, and streamline data management. For further guidance, see FME Flow Administration: Customization and Monitoring.

Search

FME Flow Administration: Job Scalability and Management

Introduction

Performance

Workspace Authoring for Performance Tuning

Parallelism

Adding Engines to an Existing Engine Machine

Job Queues

Terminology

FME Flow Job Processing without Queue Control

Advanced Tips and Tricks

Adding FME Engines on a Separate Machine

Advanced Tips and Tricks

Keep your FME Engines Close to the Data

CPU-Usage Engines

FME Core and Web Server

Workspace Versioning

Using FME Form to Version

Using FME Flow to Version

Using FME Flow with a Remote Git Repository

What's Next?

Was this article helpful?