Using a Load Balancer with a Fault-Tolerant FME Flow

Introduction

As described in the possible architectures for a fault-tolerant FME Flow deployment, a load balancer is required to send traffic to one of the redundant FME Flow web application servers. This article explains how load balancers work in conjunction with a fault-tolerant FME Flow deployment, outlines considerations for selecting a load balancer, and provides instructions on configuring your load balancer.

FME Flow Linux installations come with a NGINX reverse proxy to help with SSL and port configuration. This component can be removed if required. If you want to disable NGINX, please see this article.

Traffic Routing

To walk you through how load balancers work with a fault-tolerant FME Flow deployment, we'll go through a simplified example showing how traffic is routed between the client, the load balancer, and FME Flow.

This example of a fault-tolerant deployment illustrates how client requests are initially sent to the load balancer. Then, the load balancer forwards the traffic to one of FME Flow’s web application servers. In this example, it chose Host 2, but it could have picked Host 1’s web application server.

Round robin load balancing can be used to make sure the load balancer cycles requests between each web application server, avoiding high traffic volumes to just one FME Flow machine. See the Round Robin section below for more detail.

We recommend that the web application servers for each FME Flow installation in a fault-tolerant deployment be installed on the same machine as their cores. When the web application server receives traffic from the load balancer, it routes that traffic to the core it is installed with.

However, when the core is ready to send a job to an engine, FME Flow uses job queues to determine which engine the job is sent to. The engine could be on either machine, so long as it is both assigned and available. In this example, Host 2's core sent the job to an available engine on Host 1.

Job queues can be customized in FME Flow’s web UI so that you can choose which engines run which jobs, if desired. The load balancer does not control which engines receive which jobs.

When returning traffic to the load balancer, FME Flow will always return traffic from the web application server that the load balancer initially routed the traffic to. This eliminates the need for sticky sessions, and we recommend against using them. Refer to the Sticky Sessions section below for more information.

Load Balancer Selection

You will have to select and configure your own third-party load balancer to use with FME Flow. FME Flow is able to work with many different load balancers, allowing you to choose the one that works best for you. Your cloud provider or networking team may have recommendations and options to help you make a selection.

When evaluating your options, it is helpful to consider the different types of load balancers and the communication layer on which each type operates. Here are two common types of load balancers that are suitable for use with FME Flow.

Application Load Balancer

Application load balancers offer various Layer 7 (OSI Model) load-balancing capabilities. They can make routing decisions based on additional attributes of an HTTP request, such as the URI path or host headers. Additionally, application load balancers support TLS termination and can easily be combined with web application firewalls to protect the application.

Common examples of application load balancers include Azure Application Gateway, AWS Application Load Balancer, and F5.

For more information on TLS passthrough, TLS Termination, or TLS bridging with FME Flow, see Enabling FME Flow for Public Access.

Network Load Balancer

Network or traditional load balancers, such as Azure Load Balancer and AWS Network Load Balancer, operate at the transport layer (OSI layer 4 - TCP and UDP) and route traffic based on the source IP address and port to a destination IP address and port.

Load Balancer Configuration

Once you select a load balancer, you will need to configure it. We have included important configuration considerations below, and we recommend working with your cloud provider or networking team to design the best solution.

In a fault-tolerant FME Flow installation, both cores will be online and communicating. This means all servers and processes for your FME Flow fault tolerant deployment should be running at the same time, unless there is a failover event.

If you want one of the FME Flow installations to remain offline/passive until a datacenter goes down, please review our documentation on disaster recovery.

Routing Traffic to FME Flow’s Web Application Server

For more information on how to route traffic from the load balancer to FME Flow’s Web Application server, including information on SSL, please see this article.

Round Robin Load Balancing

Round robin load balancing, where the load balancer cycles requests through Flow’s web application servers, is recommended to avoid high traffic volumes to the same web application server.

However, it is important to note that the most resource-intensive component of FME Flow is its engines when they are running jobs, and Flow uses job queues to route jobs to different engines. As shown in the above example, this is not done at the load-balancing level. It happens internally amongst Flow’s components. To avoid overwhelming the engines, you can make use of queue control to choose which engines run which jobs.

Sticky Sessions

Do NOT use sticky sessions, where a persistent session is created between one web server and one client, with your FME Flow load balancer. This can lead to unexpected behaviors. Client requests are automatically routed back through the web application to which the request was originally sent. Additionally, FME Flow decides which core and engine requests get sent to on its own. This does not need to be managed by the load balancer.

Failover Events

We recommend using liveness health checks to detect failover events and prevent the load balancer from sending traffic to a server that has gone down. Performing a health check every five seconds may be a good starting point, but it may not meet all SLA requirements.

Using FME Flow’s REST API to route jobs to other engines during failover events is also recommended.

Load Balancers and Engines

We never recommend using a load balancer between the engines and the core for on-prem installations or basic cloud installations, where you set up the installation yourself. However, if the FME Flow cores are deployed in Azure VM Scale sets or AWS Autoscaling Groups, we recommend implementing a network load balancer to make sure the engines can reconnect with both cores in case of a failure. This is because these machines will not have a reliable hostname or IP address that the engine can use to register with a core.

Here is an example of how this would look.

The enable_registration_response_transactionhost value in fmeflowconfig.txt must be set to "true" for this to work. This parameter is set to "false" by default.

Infrastructure as Code (IaC) for Fault-Tolerant Deployments

For more information on using IaC templates for distributed or fault-tolerant deployments in the cloud, please see this article.

Additional Resources

A Guide to Choosing Your FME Flow Deployment Architecture

Enabling FME Flow for Public Access

Planning for Fault Tolerance

Set up the Load Balancer and Configure with FME Flow

FME Flow Deployment Articles

Use a Reverse Proxy with FME Flow

Getting Started With Queue Control

Planning for Disaster Recovery

Please consider posting to the FME Community Q&A if you have further questions or issues that are not addressed in this article. There are also different support channels available.

For issues or limitations specific to your cloud provider or IaC tool of choice, please refer to their documentation.

The FME Flow documentation has some guidance on setting up load balancers. You may also find it useful to browse other resources or forum questions regarding load balancers in case this addresses your concerns.

Search