Troubleshooting FME Kubernetes Deployment

Dami Obasa
Dami Obasa
  • Updated

Overview

This article helps you diagnose and resolve common issues when deploying and operating FME Flow on Kubernetes. It highlights the symptoms, likely causes, and resolutions you can implement. For full deployment guidance and other insights into the deployment process, please review this article 

Prerequisites

Before troubleshooting, confirm the following are in place. See Prerequisites and Considerations.

  • A working Kubernetes cluster with adequate CPU and memory
  • Helm installed and the Safe Software chart repo added( helm repo add safesoftware <https://safesoftware.github.io/helm-charts>)
  • An ingress controller (NGINX) is available in the cluster
  • Network access to any external dependencies such as image registries, license server, and database

Common issues

Can’t reach Web UI after install
  • Symptoms
    • 404 Not Found
    • 502 Bad Gateway or 504 Gateway Timeout
    • The browser shows the ingress hostname, but times out
  • Likely causes
    • Ingress hostname or DNS is not configured to point at the ingress address
    • TLS secret name mismatch on the Ingress
  • Resolution
    • Set deployment.hostname and, if using DNS, set deployment.useHostnameIngress=true
    • Ensure DNS resolves to the ingress address
    • If using an existing certificate, make sure deployment.tlsSecretName matches the secret bound to the Ingress, then run helm upgrade
    • Alternatively, configure cert-manager via deployment.certManager.* values
Engines don’t scale from the Web UI
  • Symptoms
    • UI accepts changes, but the engine count in Kubernetes does not change
    • No new engine Pods after saving in UI
  • Likely cause
    • In Kubernetes, engine definitions are controlled by the Helm values, not the Web UI
  • Resolution
    • In values.yaml, set fmeflow.engines[].engines and apply with helm upgrade
Wrong FME version pulled
  • Symptoms
    • ImagePull error: manifest for safesoftware/fmeflow:<tag> not found
  • Likely cause
    • Incorrect image tag
  • Resolution
    • Set fmeflow.image.tag to the intended release (for example, 2025.1) and helm upgrade
Helm repo/values not initialized
  • Symptoms
    • Error: repository name (safesoftware) not found
    • Error: chart "fmeflow" matching not found in safesoftware index
  • Likely cause
    • Missing chart repo or not using a values file
  • Resolution
    • Add the Safe repo and fetch defaults, then install or upgrade

      helm show values safesoftware/fmeflow >> values.yaml

      helm repo add safesoftware <https://safesoftware.github.io/helm-charts>

  • Proceed with the installation or upgrade. 
502/504 through Ingress or UI timeouts
  • Symptoms
    • 502 Bad Gateway from NGINX ingress
    • 504 Gateway Timeout
    • Upstream connect error or disconnect/reset before headers
  • Likely causes
    • Backend Service has no ready endpoints
    • Readiness/liveness probes failing
  • Resolution
    • kubectl get endpoints <svc>; if empty, fix probes and ensure Service selector matches Pod labels
    • Increase initialDelaySeconds and timeoutSeconds for slow starts
Pods stuck in Pending due to storage
  • Symptoms
    • pod has unbound immediate PersistentVolumeClaims
    • The pods are deployed but are not running
    • Error: "PVC <name> is Pending
    • Error: 0/3 nodes available: volume node affinity conflict
    • Error: Warning FailedMount kubelet MountVolume.SetUp failed for volume <volume_name>
  • Likely causes
    • storageClassName mismatch or missing provisioner
    • Access mode mismatch (RWO vs RWX)
    • Capacity or quota exhausted
  • Resolution
    • If using AKS(Azure Kubernetes Service), use the correct storage class. The "Setup Shared Storage using Azure Files" section in this article will provide more details on the process
    • The default PV should be bound to the default PVC 
CrashLoopBackOff on startup
  • Symptoms
    • CrashLoopBackOff, Back-off restarting failed container
    • Exit Code: 1 when you describe the pod (kubectl describe pod <pod> -n <ns>>
  • Likely causes
    • Misconfiguration, missing secrets
    • Database or license server unreachable
    • Init container failing
  • Resolution
    • Verify env and secret values, and network access to DB and license server
    •  Restarting the entire pods here can also help, as sometimes, some services may not start in the correct sequence 
ImagePullBackOff
  • Symptoms
    • ImagePullBackOff, ErrImagePull
    • Error: failed to authorize: authentication required
  • Likely causes
    • Bad image reference or tag
    • Missing imagePullSecret
    • Blocked egress to registry
  • Resolution
    • Verify repository, tag, and credentials
OOMKilled or slow performance
  • Symptoms
    • Reason: OOMKilled in container status
    • Memory group out of memory in node logs
    • High latency or queue growth in UI
  • Likely cause
    • Memory limits are too low, or workload spikes
  • Resolution
    • Increase container memory requests and limits
    • Check node pressure and add capacity if needed
Startup/readiness probe 503
  • Symptoms
    • Readiness probe failed: HTTP 503
    • Error: Liveness probe failed: connection refused
  • Likely causes
    • Slow first start
    • Wrong health endpoint or port
  • Resolution
    • Increase initialDelaySeconds and timeoutSeconds
    • Confirm probe path and port match the container’s health endpoint. 
Database connection errors
  • Symptoms
    • Connection refused to <db-host>:<port>
    • SSL: certificate verify failed
    • Authentication failed for the user
  • Likely causes
    • Wrong host, port, credentials, or TLS
    • Network blocks between the cluster and the database
  • Resolution
    • Update values and secrets
    • From a Pod, test connectivity: nc -vz <db-host> <port>
    • Open firewall or NetworkPolicy where needed
    • If using external Postgres, follow Deploying with an External Database to cross-check for misconfigurations.

References and Additional Resources

Commands and Snippets

# Check endpoints for a Service
kubectl get endpoints <svc>

# View current and previous container logs
kubectl logs <pod> -n <ns>
kubectl logs --previous <pod> -n <ns>

# Upgrade with a values file
helm upgrade <release> safesoftware/fmeflow -f values.yaml

# Get a quick cluster-wide view of pod health
kubectl get pods -A -o wide

# Describe a specific pod to inspect events, probes, and container state
kubectl describe pod <pod> -n <ns>

# Stream logs (and previous-crash logs) from a container
kubectl logs -f <pod> -n <ns> 
kubectl logs --previous <pod> -n <ns> 

# Verify Services have ready endpoints
kubectl get svc,endpoints -n <ns>

# Exec into a running container for on-box checks (curl, nc, env)
kubectl exec -it <pod> -n <ns> -c <container> -- /bin/sh

Was this article helpful?

We're sorry to hear that.

Please tell us why.

As of January 14th, 2026, comments on knowledge base articles have been closed. To make sure questions don’t get missed and to enable more community support, we’ve moved discussions to the FME Community. If you have a question or a comment about this article, please create a new post or create a support ticket.