How to Read the FME Flow Hosted Metrics

Introduction

FME Flow Hosted (formerly FME Cloud) comes with a great set of tools to allow the user to easily set up alerts based on certain conditions that might affect the uptime and performance of an FME Flow Hosted instance. But before we can create alerts and configure notifications, we need to understand the metrics our alerts are based on. These metrics are visible when creating an alert or under the Monitoring tab on the instances page with an instance selected.

Memory Usage

The memory consumption of FME Flow (formerly FME Server) depends a lot on the underlying workspace and the transformers used. The Memory Usage metric is the first metric to look at if an FME Flow Hosted instance seems to be in trouble. Out-of-memory conditions have been among the most common causes of unresponsive FME Flow Hosted instances. Some jobs might fail very abruptly and the log file might not have the information you are looking for. The metric can also help to investigate job failures. One thing to keep in mind while looking at the memory usage is that might also want to check on the temporary Disk Usage when you are run low on memory. Some translations will write to the temporary disk when the instance runs out of Memory.

Disk Usage

Primary Disk Usage

The Primary Disk contains the FME Flow install, data you publish to FME Flow, and the PostgreSQL database. When the primary disk is full, the Web Application server might shut down and fail to start up correctly even after an instance reboot. Often, the only way to recover is a rollback to a previous backup. That's why the primary disk usage alert (90% usage over 10 minutes) is enabled for all instances by default.

Another useful tool to prevent running out of disk space is the FME Flow System Cleanup.

Temporary Disk Usage

This disk maps to the Temp resource folder on FME Flow. It is wiped when the instance is paused, and it is not backed up. This temporary disk usage can also increase when the instance runs low on memory and starts to write out temporary data. It is recommended to always check the memory usage as well when an unusual pattern is seen in the temporary disk usage. See FME Flow Hosted: How to Speed Up Workflows with the Temporary Disk, for more information.

FME Flow Hosted Engine Count

The FME Flow Engine count can be set via the web interface of FME Flow. A higher engine count than initially set can have different causes. Depending on your workflow, additional engines can be started by jobs using the FMEFlowJobSubmitter. If a higher number of engines appears than you would expect in the metrics, there might be a problem. Pay particular attention if there is a pattern of engines starting and not shutting down anymore. You will need to look into the problem. If a constant engine count is expected, it is very useful to set up an alert to notify as soon as the metric changes.

Network Throughput

The network throughput metric lets you monitor the input and output of your FME Flow Hosted instance in kilobytes per second. If you implement a solution on FME that allows clients to upload and download data, this can be very useful in detecting any unusual behavior.

Response Time

This metric points to the FME Flow health check page. A high number might indicate that the server is under heavy load and responds slower than usual to requests. If you experience a response time longer than 500 ms over a period of 10 minutes or more, you should take a look at the instance and check other metrics like the server load or the memory to see if the instance is struggling.

Server Load

A high server load often comes in combination with high memory utilization. Also, the more engines you run, the higher the server load will be. To correctly interpret the server load and to set a sufficient threshold for your alert, it is important to understand the server load metric and its implications regarding the number of cores of your FME Flow Hosted instance. A load of 1.0 means 100% utilization of 1 core. Our FME Flow Hosted Standard instances come with 2 cores and therefore a load of 2.0 indicates full utilization of the 2 cores.

Search