Introduction
This article expands on the FME Server installation options documented in Choosing a Deployment Architecture. Here we provide a high-level overview of the deployment, as well as describe how FME Server will behave in the event of system failure and the documentation to follow for installation.
Express
An Express architecture installs all the required components of FME Server on a single host machine and is the quickest and easiest way to get started with FME Server.
If you are new to FME Server and are not concerned with planning for a Distributed/Fault-Tolerant architecture in the near future, then we’d recommend performing an Express installation.
In this environment, if the system goes down, all jobs that were running at the time of failure will be resubmitted to the queue and the entire translation will start over once FME Server is back up. If you do not want jobs to be resubmitted and instead report failure status when the system goes down, you should disable Job Recovery.
To perform this installation follow the documentation appropriate for your chosen operating system:
This deployment can be scaled by Adding FME Engines on a Separate Machine. However, to move from an Express to a Distributed/Fault-Tolerant, you’d need to perform the upgrade process.
Distributed
A Distributed installation offers options for spreading components across a network for you to provide and maintain. See Distributing FME Server Components for more information. A distributed installation can be made fault-tolerant at any time. By choosing a distributed option, you can use your organization’s database and file share, which you can configure for redundancy. This can reduce potential downtime or data loss in the event of a failure.
With this architecture model, if the core host goes offline the engines on this host will also go down and any jobs that were running will be resubmitted to the queue on restart. If you have set up distributed engines, jobs on these will continue to run, but the core will not be aware of this and will also resubmit them, therefore with distributed engines we recommend disabling Job Recovery; otherwise, you can end up with duplicate job submissions during a failure event.
If the system database goes down, any running jobs will continue but the Web UI will be unresponsive.
If the system share goes down while a job is running the job will complete, but the job log may not be recorded.
To perform this installation follow the documentation appropriate for your chosen deployment:
- 2-tier distribution with Web/Core on the same host (Recommended)
- 3-tier distribution with Web and Core separated
This deployment can be scaled by Adding FME Engines on a Separate Machine. This deployment can be moved to a fault-tolerant architecture by repeating the Web/Core/Engine install on a second machine pointing to the same System Share and Database, and then configuring a load balancer.
Fault-Tolerant
A fault-tolerant deployment extends the distributed model. In addition to spreading components across a network for you to maintain, it is composed of redundant FME Server Web/Core services spread across separate host machines, this architecture ensures that if a hardware component fails, FME Server remains online, ensuring high availability. See Planning for Fault Tolerance for more information.
Below we have provided three examples of possible deployment options. Note: these diagrams demonstrate the configuration with 2 Web/Core hosts, however, we have seen users successfully deploy this model with 3+ Web/Cores.
To perform this installation, follow the documentation. To configure example 2 or 3 install the distributed engines following Adding FME Engines on a Separate Machine.
Example 1: FME Server Engines Hosted on the Same Machine as the Web/Core Services
In this scenario, the FME Server Engines are hosted on the same machine as the Web/Core Services.
If one of your hosts goes offline the FME Server Engines on this host will also go down and be unavailable to run jobs.
Any running jobs will be resubmitted to the queue. If you do not want jobs to be resubmitted and instead report failure status when the system goes down, you should disable Job Recovery.
If the outage is going to be for an extended period of time, you can manually change the engine count in the Web UI so that all engines are available on the host that remained online. You should ensure each host meets the technical specifications to be able to run all engines in this case.
Alternatively, the change to engine count can be automated using the REST API V4 endpoint
POST /fmeapiv4/enginehosts/{name}/engines/scale
Note: REST API V4 is in tech preview for FME 2022, to see the documentation go to http://<FMEServerHost>/fmeapiv4/swagger-ui/index.html. This endpoint is unavailable for use in earlier versions of FME. Tech preview means endpoints are still subject to change so care should be taken to use these in a production environment.
Example 2: FME Server Engines are Distributed and Hosted on a Separate Machine from the Web/Core Services
In this scenario, the FME Server Engines are distributed and hosted on a separate machine from the Web/Core Services. There can be any number of distributed Engine hosts associated with the Core.
In this scenario, if one of your hosts goes offline the FME Server Engines will attempt to reconnect to that host before attempting to connect to the second host.
If an Engine is actively running a job, it will be ‘missing’ from the Engines page of the Web UI until this job is completed, at which point it will then go through the process of attempting to reconnect to the core.
Although jobs will continue to run, since the Engine has lost communication with the Core, the remaining Core assumes the job also failed and will resubmit it. Therefore with distributed engines, we recommend disabling Job Recovery; otherwise, you can end up with duplicate job submissions during a failure event.
Example 3: FME Server Engines are Hosted on the Same Machine as the Web/Core Services and Some Engines are on Distributed Hosts
In this scenario, the FME Server Engines are hosted on the same machine as the Web/Core Services and there are Engines running on distributed hosts.
In this scenario, if one of your hosts goes offline the FME Server Engines on this host will also go down and be unavailable to run jobs unless you manually change the engine count to move these over to run on the remaining active host or one of the distributed hosts.
Alternatively, the change to engine count can be automated using the REST API V4 endpoint
POST /fmeapiv4/enginehosts/{name}/engines/scale
Note: REST API V4 is in tech preview for FME 2022, to see the documentation go to http://<FMEServerHost>/fmeapiv4/swagger-ui/index.html. This endpoint is unavailable for use in earlier versions of FME. Tech preview means endpoints are still subject to change so care should be taken to use these in a production environment.
The distributed Engines will attempt to reconnect to that host before connecting to the second host.
If the distributed Engine is actively running a job, it will be ‘missing’ from the Engines page of the Web UI until this job is completed, at which point it will then go through the process of attempting to reconnect to the core.
Although jobs will continue to run since the distributed Engine has lost communication with the Core, the remaining Core assumes the job also failed and will resubmit it. Therefore with distributed engines, we recommend disabling Job Recovery; otherwise, you can end up with duplicate job submissions during a failure event.
Summary (Pros & Cons)
Installation | Pros | Cons |
---|---|---|
Express |
|
|
Distributed |
|
|
Fault-Tolerant |
|
|
Comments
0 comments
Please sign in to leave a comment.