FME Server on Kubernetes: Utilizing Engine Assignment and Job Routing

Liz Sanderson
Liz Sanderson

FME Version

Introduction

FME Server 2021 introduced new mechanisms for managing FME Server engines and queues.
Engines can now be assigned to queues based on either engine properties, or engine names. Engine properties by default are the OS, the build of FME Server, the license type, etc. Users are also able to add their own engine properties.
Jobs are routed to queues based on user-defined rules. The traditional method of routing by repository is still available, but there is the additional option to route based on workspace statistics. Workspace statistics track runtime information such as peak memory usage, % CPU utilization and more.
See our blog for an introduction to Analyzing Job Statistics in FME Server .

This is beneficial for containerized deployments of FME Server, such as Docker and Kubernetes. Engine names are more likely to change in these deployments, so assigning engines to queues based on engine properties is more useful.

Prior to 2021, FME Server queues could be defined in the values.yaml file. Going forward, we recommend that engine properties parameter is defined in the values.yaml and queues are managed through the FME Server Web UI, as described in this article.
For information on all of the parameters available, please refer to our GitHub.

This article will walk through an example that combines FME Server Engine Management with Kubernetes Node Selectors to make the best use of the cluster infrastructure so that FME Server jobs are processed faster. 
Engine deployment groups will be assigned to different nodes of different types (general purpose, compute optimized, memory optimized instances). Then FME Server job routing rules will be set up to make sure that workspaces are processed on the correct engine based on their statistics.

This article assumes existing knowledge of how to deploy and manage an FME Server Kubernetes Cluster.
 

Configure Nodes

Kubernetes allows pods to be constrained to run on a particular set of nodes(s).
You may want to do this to take advantage of different node instance types and sizes. For example, this would allow you to run FME Server jobs that are more memory intensive on a node type that is suited to those workloads.

The recommended approach to do this is to attach labels to a node, and use node selectors when scheduling pods.
You can follow the Kubernetes documentation for details on how to do this: Attach label to the node

In this example I have 3 nodes, labelled according to their VM type:

Screen Shot 2021-04-23 at 10.23.27 AM.png

The label key is Property, and the label value is generalPurpose, memoryOptimized, or computeOptimized.

Here are the nodes as shown in the Azure Portal:

Screen Shot 2021-04-23 at 9.41.14 AM.png

There are three different sizes of nodes:

- The Dv2 and DSv2-series feature a powerful CPU and optimal CPU-to-memory configuration making them suitable for most production workloads.
- The Eav4-series sizes are ideal for memory-intensive enterprise applications.
- The Fsv2-series are really fast for any computational workload.
 

Configure the FME Server Deployment

Once the nodes are labelled, Kubernetes needs to know which node(s) to schedule engine pods onto. This is done using a nodeSelector in the values.yaml.

In order to set up Engine Assignment Rules in FME Server the engineProperties parameter needs to be set.
The container resources can also be configured for each engine group. Kubernetes will use this information to decide where to place a pod. It will not place a pod on a node that does not have enough available resources. For more information see: Managing Resources for Containers

Below is an example of 3 different engine deployment groups, designed to run on different nodes and process different jobs.
You can see how the engineProperties, nodeSelector and resources are configured differently for each engine deployment group.

engines:
   - name: "standard-group-1"
     engines: 1
     type: "STANDARD"
     engineProperties: "generalPurpose"
     labels: {}
     affinity: {}
     nodeSelector:
       property: generalPurpose
     tolerations: []
     resources:
       requests:
         memory: 2Gi
         cpu: 500m
   - name: "standard-group-2"
     engines: 1
     engineProperties: "memoryOptimized"
     type: "STANDARD"
     labels: {}
     affinity: {}
     nodeSelector:
       property: memoryOptimized
     tolerations: []
     resources:
       requests:
         memory: 4Gi
         cpu: 500m
   - name: "standard-group-3"
     engines: 1
     engineProperties: "computeOptimized"
     type: "STANDARD"
     labels: {}
     affinity: {}
     nodeSelector:
       property: computeOptimized
     tolerations: []
     resources:
       requests:
         memory: 1Gi
         cpu: 1000m


Applying this values.yaml file to the FME Server deployment results in the engine pods being scheduled onto the correct nodes:

Screen Shot 2021-04-23 at 11.13.10 AM.png
 

Configure Engine Assignment Rules

Once FME Server has been deployed, you will see the newly defined engine properties on the engines page in the FME Server Web UI:

Screen Shot 2021-04-22 at 7.56.09 PM.png

Next, create queues that correspond to the engine properties and how you’d like to route your jobs. In this example, I’m using the Default queue for general purpose FME Server jobs, and have created 2 new queues for compute and memory intensive workflows:

Screen Shot 2021-04-22 at 7.50.00 PM.png

Assign engines to the newly created queues on the Engine Assignment Rules tab.

Add a property that matches the engine properties defined in the values.yaml (you can refer back to the Engines tab to check this). Assign this property to a Queue:

Screen Shot 2021-04-22 at 7.53.26 PM.png

For this example we have 3 Engine Assignment Rules:

Screen Shot 2021-04-23 at 10.46.58 AM.png
 

Configure Job Routing Rules

At the moment, any jobs that are run on FME Server will be sent to the Default queue, which will be processed on the general purpose nodes.

We will configure Job Routing Rules based on workspace statistics, so that workspaces get processed on the most suitable node. Ideally, the workspaces that are getting close to 100% CPU utilization will be processed on CPU optimized nodes, and the workspaces with the highest peak memory usage will be processed on the memory optimized nodes.

Here we can see the metrics of the workspaces that have been run on the general purpose node:

Screen Shot 2021-04-22 at 8.55.26 PM.png

Job Routing Rules can be created based on workspace repository or workspace statistics, and will be evaluated top down.

On this FME Server, three rules have been created:

Screen Shot 2021-04-23 at 10.51.46 AM.png

If a workspace statistic reports that its % CPU use exceeds 85%, it will get routed to the Compute Intensive queue.
If a workspace statistic reports that its peak memory use is equal to or greater than 100mb, it will get routed to the Memory Intensive queue. Workspaces from the RasterProcessing repository will also get routed to this queue, on the assumption that they will likely require more memory to process.
Any workspaces that do not not meet this criteria will be routed to the Default queue.

Running the workspaces again shows that they have been processed by the correct queues:

Screen Shot 2021-04-22 at 8.57.53 PM.png

The workspaces highlighted in red are workspaces that were routed to a different queue. Each of these workspaces processed faster on the optimized node.
 

Additional Resources:

Documentation: Defining FME Engines and Queue Control Properties 

Was this article helpful?

Comments

1 comment

  • Comment author
    martinkoch
    • Edited

    I have FME Flow running on a default AKS cluster, with a userpool and an agentpool.
    Adding a second node-pool with two engines fails. I think I followed above steps, but the engine-pods on the new node-pool remain pending with message

    'node(s) didn''t match pod affinity rules'. 

    I use the above described ‘nodeSelector’ which seems to select the right pool. The other pool is revoking these pods based on both affinity and selector mismatch.

    Doing a kubectl get pod -n fmeflow -o yaml [podname] reveils only one affinity I did not specify in my install-yaml.

    spec:
    	affinity:
    		podAffinity:
    			requiredDuringSchedulingIgnoredDuringExecution:
    			- labelSelector:
    				matchExpressions:
    				- key: safe.k8s.fmeflow.component
    					operator: In
    					values:
    					- core
    				topologyKey: kubernetes.io/hostname

    The pods which do run on the ‘AKS default userpool’  have the same affinity.

    Is there something missing in my setup, like a secret key or label in the new nodepool?
     

    Kind regards,
    Martin 

     

    0

Please sign in to leave a comment.