A Guide to Choosing Your FME Flow Deployment Architecture

Sanae Mendoza
Sanae Mendoza
  • Updated

Introduction

FME Flow administrators should become familiar with FME Flow (formerly FME Server) architecture and deployment options before performing an installation. This article expands on the FME Flow installation options documented in Choosing a Deployment Architecture

 

Architecture

FME Flow consists of multiple components that may be installed together or separately. Understanding each component’s function is critical for planning an effective deployment. 

The five primary components of FME Flow are: 

Symbology FME Flow Component Primary Functions
Core Coordinates and distributes job requests (queuing, request routing, scheduling) and automation processes. 
Engines Process job requests by running FME Workspaces. 
Web Application Server Runs the FME Flow Web User Interface, FME Flow Web Services, and any other web clients. 
Database Stores critical configuration and metadata related to FME Flow, including jobs, repositories, automations, users, and other data. 
System Share A central directory used to store workspace files, log files, and other data uploaded to FME Flow.

 

FME Flow Engines

FME Flow Engines are responsible for processing jobs. Engines may be installed on the same machine as the rest of the FME Flow Components and/or separately. For best performance, careful consideration should be given to how many Engines are active and where they are located in proximity to the data. 

  • To increase job throughput, increase the number of engines. 
  • To increase performance, consider installing engines closer to data sources and workflow optimization

Review the documentation on Planning for Scalability and Performance for an overview of each approach.

 

Engine Licensing Type

There are two license types for FME Flow Engines: Standard and CPU Usage. Both Engine types process jobs the same way, but they differ in billing and installation. To decide which Engine is right for your deployment, consider your job duration, frequency, and how the jobs will be triggered.  

Engine Licensing Type Recommended Usage Billing Scalability
Standard Predictable and consistent workloads. Includes but not limited to: 
  • Scheduled jobs
  • Automations
Standard licenses are fixed and purchased per engine, allowing an engine to run jobs for the entire license period.  Purchase any number of Engines with your FME Flow license. To add engines, purchase additional licenses. 
CPU Usage (Dynamic) Unpredictable or variable workloads. Includes but not limited to: 
  • Infrequent but intensive jobs
  • Irregular but critical jobs
  • FME Apps
  • Streams
  • Automations
CPU Usage licenses are credit based. Purchase credits for processing. Engines consume credits when active and processing jobs.  Credits enable unlimited engines. 

Read more about FME Engines types. 

 

Engine Deployment Type

FME Flow Engines may be installed separately from the primary FME Flow installation for any one of several reasons: 

  • Improved Performance and Scalability: By distributing processing across multiple machines, the system can handle larger workloads and process multiple jobs in parallel, reducing overall job execution time, and can be included in a cloud scale-set for dynamic scalability.
  • Load Balancing: Jobs are distributed across available engines, optimizing resource usage and preventing bottlenecks that can occur with a single-engine setup.
  • Fault Tolerance: If one engine fails, other distributed engines can continue processing, improving system reliability and reducing downtime.
  • Flexible Resource Allocation: Distributed engines allow you to scale processing power as needed, adding more engines to accommodate growing or fluctuating job demands without affecting overall performance.
  • Cost Efficiency: The ability to distribute workloads dynamically means resources can be used more efficiently, potentially lowering operational costs by reducing the need for constant high-capacity infrastructure.

Two different Engine deployment types facilitate a separate installation: Distributed and Remote

Distributed Engines are installed to the same network as FME Flow single network or environment to balance the workload across multiple environments, optimizing performance for high job volumes. In contrast, Remote Engines may be installed to separate networks and can process jobs independently, typically used to extend processing power to other locations or environments without direct resource sharing.

Either deployment can use a Standard or CPU Usage license. 

Engine Deployment Type Recommended Usage Installation
Distributed
  • Additional processing power and resources
  • Unique workflow environment requirements:
    • Mixed OS 
    • Access to 3rd-party dependencies
  • Must be within the same network as the rest of the FME Flow Components
    • Requires numerous bidirectional ports and continuous network traffic for web and other component communication
Remote
  • Additional processing power and resources
  • Operations in multiple networks (on-premise, on-cloud, hybrid/multi-cloud)
  • Unique workflow environment requirements:
    • Increased security
    • Mixed geographic locations
    • Mixed OS 
    • Access to 3rd-party dependencies
  • Can be in the same network or different networks 
    • Web-based communication between components
    • Port 80/443 must be opened for web traffic

 

Leverage different Engine types for different processes in your deployment for maximum flexibility.

With either deployment type, your FME Flow Engines should be as close to the data as possible for optimal performance. This reduces the effect of network latency and wait times. 

 

FME Flow Deployment Types

Express 

An Express deployment installs all default FME Flow components on a single host machine (Figure 1). 
Express Install.png
Figure 1: An Express installation with FME Flow components installed on a single host. 

 

For most FME customers, the Express deployment type is recommended. Its simple installation process minimizes potential issues while delivering strong performance.

When provided with sufficient resources, an Express installation can perform on par with other deployment options. Performance can be scaled by increasing machine resources and the number of available FME Flow Engines.

The single-machine design also helps reduce or eliminate potential issues common in distributed deployments, such as networking and communication challenges.

In the event of a system failure, Job Recovery automatically resubmits any running jobs to the queue when back online. 

Express installations do not support fault-tolerance, high availability, or custom components. If any of these are required or will be in the future, a Distributed installation is recommended. 

Scaling an Express Installation 

Express deployments can be scaled by adding more engines to the host machine or adding more engines to a separate host machine. However, to convert an Express deployment to a Distributed/Fault-Tolerant, FME Flow needs to be reinstalled

 

Distributed  

A Distributed deployment installs FME Flow components across multiple host machines (Figure 2). Customers can choose both the number of machines and where each component is installed. 

Distributed Install.png

Figure 2: A Distributed installation with FME Flow components installed across four hosts. 

 

Distributed installations provide the highest levels of flexibility and scalability, making them ideal for customers with specific architectural needs, such as integrating a custom database or ensuring fault tolerance. However, implementing and managing a Distributed installation involves more complexity, requiring greater expertise compared to simpler setups.

See Distributing FME Flow Components for more information.

There are many ways that FME Flow components can be distributed (Figure 3). Simple is better: fewer hosts mean fewer points of failure. There are two examples of distributed deployments in the documentation: 

Figure 3: Example Architectures for Distributed FME Flow Deployments

 

Distributed also offers the option to replace default components with custom components. For example, customers may choose to replace the default FME Flow Database (Postgres) or Web Application Server (Apache Tomcat) with their own applications or versions. 

A Distributed installation can be made fault-tolerant. Fault tolerance reduces potential downtime and data loss in the event of a failure. In the event of a system failure, the redundant node can accept and resume processing jobs. 

 

Scaling a Distributed Installation 

Distributed installations can be scaled by installing more engines to any host machine or on a separate host machine.

 

Fault-Tolerance

A Distributed deployment can be configured for fault tolerance to ensure high availability (Figure 4). Fault tolerance helps keep FME Flow online even if hardware fails, reducing potential downtime and data loss. In the event of a system failure, a redundant node can take over and resume processing jobs.

Fault tolerance enables high availability by installing redundant FME Flow components (the Web Application Server, Core, and Engines) on isolated machines. These components work with a load balancer and share a common System Share and Database, both of which can also be configured for redundancy.

Figure 4: A Fault-tolerant FME Flow Deployment installed across four hosts.

See Planning for Fault Tolerance for more information.

 

Example Fault-Tolerant Deployments 

These examples demonstrate distributed deployments configured for fault tolerance. The point of fault tolerance is that there are multiple, duplicate FME Flow hosts that ensure jobs keep running in the event of host failure. 

In a fault-tolerant architecture, FME Flow Engines are designed to continue processing jobs even if there is a temporary disconnection from the FME Flow Core. This feature ensures that jobs are not interrupted, as the engines can finish running the assigned tasks without requiring immediate reconnection to the core. Because of this, the default Job Recovery settings, which automatically resubmit failed or incomplete jobs, become unnecessary. In fact, using Job Recovery in this environment may lead to issues, such as accidentally resubmitting jobs that are still running on a disconnected engine. This could result in duplicate jobs being executed, causing inefficiencies or errors in the system. Therefore, we recommend disabling Job Recovery in fault-tolerant environments to avoid this potential conflict. 

 

Example 1: Shared Host for FME Flow Engines, Web Application Server, and Core 

In the Example 1 scenario (Figure 5), Engines are hosted on the same machine as the Web Application Server and Core (Host A). This environment is duplicated on a second host (Host B).


Figure 5: Host A and B are redundant FME Flow Engines, Core, and Web Application Servers.

 

If Host A were to fail, its FME Flow Engines go offline, any running jobs are canceled, and it is no longer able to run jobs until recovered. Any jobs that were running or queued on Host A will be resubmitted to the queue on Host B (Figure 6) 

 


Figure 6: Host A has gone offline, Host B is still online and running jobs. 

 

Example 2: Separate Hosts for FME Flow Engines, Web Application Server, and Core 

In the Example 2 scenario (Figure 7), Engines are installed to a dedicated machine (Host C), separate from the Web Application Server and Core (Host A). This deployment is duplicated on other machines (Hosts B/D). There can be as many Engines hosts as needed. 


Figure 7: Host A and B have redundant FME Flow Core and Web Application Servers. Host C and D are distributed Engine machines. 

 

The advantage of this architecture is that jobs will continue to run on distributed Engines (Hosts C/D), even when the Web Application/Core (Hosts A/B) fails (Figure 8).

For example, if Host A goes offline, the Engines on Host C will attempt to reconnect to Host A. If Host C fails to reconnect to Host A, Host C will connect to Host B instead. If an Engine on Host C is running a job at the time of disconnection, the Engine will continue to be online and run the job (even though it doesn’t appear to be running in the Web UI). 


Figure 8: Host A has gone offline. Host C continues running jobs and connects to Host B. Jobs on Host D are unaffected.

 

Example 3: Shared and Separate Hosts for FME Flow Engines, Web Application Server, and Core 

In the Example 3 scenario (Figure 9), Engines are installed on both isolated machines (Host C) and shared machines with the Web Application Server and Core (Host A). This deployment is duplicated on other machines (Hosts B/D). There can be as many Engines hosts as needed. 


Figure 9: Host A and B have redundant FME Flow Cores, Web Application Servers, and Engines. Host C and D have additional, isolated Engines. 

 

The advantage of this architecture is that it provides even greater stability in the event of failure by increasing the number and type of Engine hosts (Figure 10). Jobs will continue to run on isolated Engines hosts, even when one of the Web Application/Core/Engine hosts goes offline. The jobs running on that remaining online Web Application/Core/Engine host will be unaffected. 

For example, if Host A goes offline, jobs will continue to run on Hosts B, C, and D. If Host A is not recovered, the Engines on Host C will reconnect to Host B instead. Host B keeps FME Flow operational (albeit without the Engines on Host A) until Host A is recovered.

Figure 10: Host A has gone offline. Jobs on Host A fail and are resubmitted to Host B. Jobs running on Host C are uninterrupted while Host C reconnects to Host B. Jobs on Host B and D are unaffected.

 

Maintaining Job Throughput After Failure 

In a fault-tolerant environment, if an FME Flow Engine host goes offline for an extended period, the number of available engines decreases, reducing job throughput. To maintain throughput, you can temporarily increase the number of engines on the remaining active host to run all licensed engines at once. However, this puts additional strain on the remaining host, so ensure it has sufficient resources to handle the extra load before proceeding.

You can scale engines either through the Web UI (Engine Management page) or programmatically using the FME Flow REST API V4 via the POST ​/enginehosts​/{name}​/engines​/scale endpoint.

The FME Flow REST API V4 is still in Tech Preview. If you have FME 2022+, the documentation can be accessed via http://<FME Flow Host>/fmeapiv4/swagger-ui/index.html. Until its official release, care should be taken in a production environment because the endpoints are subject to change

 

Disaster Recovery

Disaster recovery in FME Flow is essential for maintaining system availability and meeting Service Level Agreements (SLAs). A comprehensive plan includes regular backups of the FME Flow database (containing job history, schedules, and configurations) and the System Share (which holds workspaces and temporary files).

To meet SLAs, high availability strategies should be implemented, such as load balancing across multiple FME Engines and redundant setups for critical components like the Core and Database. These measures ensure minimal downtime and data loss, allowing FME Flow to recover quickly from failures and continue meeting the uptime and performance commitments outlined in SLAs.

See Planning for Disaster Recovery.

 

Comparison and Summary 

For the vast majority of users, Express is the recommended FME Flow deployment type. For better scalability, the next recommended type would be an express installation with distributed engines

Review the documentation on Planning for Scalability and Performance for an overview of options for increasing job throughput for an Express deployment.

Installation Pros Cons
Express
  • Fast deployment: Easy installation
  • Lower cost: Fewer infrastructure requirements (e.g. single server) mean reduced hardware and operational costs
  • Minimal maintenance: Easier to maintain since all components reside on a single machine.
  • (Limited) Scalability: Performance can be improved by adding additional Engines on the same or distributed machines
  • FME Flow must be completely reinstalled to become Distributed
  • Not suitable for high availability or fault tolerance
  • Single point of failure (single server)
  • Performance limitations: System performance could be constrained by resources available on the single server
Distributed
  • Separate components can be managed by the dedicated expert teams
  • Scalability: Separation of components allow each server to have dedicated resources for increased processing requirements
  • Flexibility: Easier to expand and reconfigure based on changing processing needs or user requirements
  • Customization: Users can bring their own Database or Web Application Server
  • Involvement from other teams is needed to complete the configuration and any ongoing maintenance
  • Additional costs for multiple host machines
  • Higher complexity: More complex to set up and maintain due to multiple servers, network configurations, and management of separate components
  • Increased cost: Requires more infrastructure, such as additional hardware and potentially higher operating costs of that hardware
  • Moderate fault tolerance: While distributed across some servers, the system may have some points of failure (e.g. an Engine goes down)
Fault-Tolerant
  • All the pros of the Distributed deployment
  • High availability: Designed for critical, production-grade environments where downtime must be minimized. Redundancy across components ensures the system remains operational during failures.
  • Improved disaster recovery: In case of component failure, the system can automatically failover to backup components.
  • Load balancing: With multiple nodes and redundancy, traffic and workloads are balanced, improving system reliability and performance.
  • All the cons of the Distributed deployment
  • High complexity: The most complex deployment to set up and maintain due to the need for load balancers, clustering, and failover configurations.
  • Higher cost: Requires more infrastructure (e.g., multiple servers, redundant databases, load balancers) and administrative overhead, leading to increased costs.
  • Advanced knowledge required: Requires deeper technical knowledge and resources for setup, configuration, and ongoing maintenance.

 

Multiple FME Flow Environments

Consider using multiple development environments for FME Flow to ensure that changes or updates are thoroughly tested before being deployed to production. Separate environments, such as development, testing, and production, allow for safer experimentation and troubleshooting without impacting live data or workflows. 

It is critical to keep these environments identical or nearly so in terms of configurations, resources, and versions to ensure consistent behavior. Discrepancies between environments can lead to issues that are difficult to detect during testing, resulting in unexpected failures or performance issues when workflows are promoted to production. 

 

Conclusion

In summary, choosing the right FME Flow deployment architecture depends on your organization's performance, scalability, and reliability needs. Express deployments are ideal for simplicity and cost efficiency, while Distributed deployments offer greater flexibility and scalability. Fault-tolerant configurations provide high availability for critical environments. Assessing your workload, future growth, and fault tolerance requirements will help determine the best deployment option to support your operational goals.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.