Monitoring FME Flow Availability with REST API Health Checks

Liz Sanderson
Liz Sanderson
  • Updated

Introduction

The FME Flow REST API V3 /healthcheck request returns the operational status of FME Flow's critical components. A request can be made to check for either “Liveness” or “Readiness”.

This article provides details on which component checks are performed for each of these requests and tips for investigating failures. 

 

Why Use Health Checks?

A health check request can be used by any person, service, or application that needs to verify that FME Flow is available and operational. Some examples include: 

  • An FME Flow Administrator, for automated monitoring and notifications. 
  • A load balancer, to determine where jobs should be routed.
  • A webhook service, before sending event data
     

Liveness

Liveness is when all FME Flow critical components are running and responsive.

A liveness check is performed by default when no value is set for the “ready” parameter, or when the “ready” parameter is set to false, e.g.

http://<hostname>/fmerest/v3/healthcheck

or

http://<hostname>/fmerest/v3/healthcheck?textResponse=false&ready=false

 

A liveness check verifies that the following components are active and responsive: 

Component Process Log File
Configuration FMEConfiguration.exe core > fmeconfiguration.log
Resources FMEMountPoint.exe core > fmesharedresource.log
Connections FMEConnection.exe core > fmeconnection.log
Scheduler FMEScheduler.exe core > fmescheduler.log
Notifier FMENotifier.exe core > subscribers >  fmesubscribers.log
Relayer FMERelayer.exe core > publishers > fmepublishers.log
Cleanup FMECleanup.exe service > fmeprocessmonitorcleanup.log
Tomcat FMEFlow_ApplicationServer.exe   tomcat > catalina.log


If all listed components are responsive, the liveness check will return a 200 status code and message:

OK

If one or more listed component(s) are not responsive, the liveness check will return a 500 status code and message:

An FME Flow component is unavailable. 

 

Readiness

Readiness is when all FME Flow critical components are running and responsive and the system is accepting jobs. 

A readiness check is performed when the “ready” parameter is set to true, e.g.

http://<hostname>/fmerest/v3/healthcheck?textResponse=false&ready=true


A readiness check verifies that the following components are active and responsive:

Component Process Log File
Configuration FMEConfiguration.exe core > fmeconfiguration.log
Resources FMEMountPoint.exe core > fmesharedresource.log
Connections FMEConnection.exe core > fmeconnection.log
Scheduler FMEScheduler.exe core > fmescheduler.log
Notifier FMENotifier.exe core > subscribers >  fmesubscribers.log
Relayer FMERelayer.exe core > publishers > fmepublishers.log
Cleanup FMECleanup.exe service > fmeprocessmonitorcleanup.log
Tomcat FMEFlow_ApplicationServer.exe   tomcat > catalina.log
Queue memurai.exe or redis-server.exe *  queue > localhost_queue.log
Post-install scripts have successfully executed N/A  installation

*note: memurai.exe is used by FME Flow 2023+ win64. Redis-server.exe is used by all other FME Flow versions.

If all listed components are responsive and the queue is active, a successful readiness check will return a 200 status code and message:

OK


If one or more listed component(s) are not responsive and/or the queue is not active, a readiness check will return a 500 Status Code and message:

FME Flow is Not Ready. 

 

How to Investigate a Health Check Failure (REST API V3)

In the FME Flow REST API V3, health check requests are only a pass or fail test to alert us to a problem. When a failed status is returned, the response does not tell us which component(s) are causing the failure. When we receive a failure message, we must do further investigation to identify which component is inactive or unresponsive. 
 

Log Files

Start a log file investigation by looking for ERROR or FATAL messages in the fmeserver.log and fmeprocessmonitorcore.log (core folder). Proceed to check each log file associated with each listed component process. Note that log folders are divided into current and old folders. If a process is crashing, the error messages may be found in the old folder.

The FME Flow Debugging Toolbox: Log Files contains instructions for using FME Flow log files as a troubleshooting tool.

 

Processes

Ensure that each FME Flow component process is running.

Instructions for investigating FME Flow Processes can be found in the FME Flow Debugging Toolbox: System Tools.

 

How to Investigate a Health Check Failure (REST API V4)

NOTE: FME Flow REST API V4 is still in technical preview. We do not recommend using V4 in production workflows until it is officially released. 

In the FME Flow REST API V4, health check requests can return more detailed information by using the includeDetails parameter. This is useful after a general health check fails. 

Health check request in V4: 

http://<hostname>/fmeapiv4/healthcheck/liveness

Health check request in V4 with the includeDetails parameter: 

https://<hostname>/fmeapiv4/healthcheck/liveness?includeDetails=true

The response will include a list of all FME Flow components and their respective statuses:

{ "status": "ok", "message": "FME Flow is healthy.", "components": [ { "component": "resources", "status": "up" }, { "component": "relayer", "status": "up" }, { "component": "notifier", "status": "up" }, { "component": "cleanup", "status": "up" }, { "component": "configuration", "status": "up" }, { "component": "scheduler", "status": "up" }, { "component": "core", "status": "up" }, { "component": "connections", "status": "up" } ] }

Any components listed as "down" should be further investigated.  

 

Additional Resources

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.