FME Flow - Host System Sizing

SteveatSafe
SteveatSafe
  • Updated

Introduction

When it comes to hardware sizing for FME Flow (formerly FME Server), there is no hard-fast rule and no accurate means to determine this before deployment. It all depends on how FME Flow will get used. There are however recommended requirements. These requirements will be sufficient for the basic use of FME Flow, but if you are going to truly utilize the power of FME Flow, you’ll want to dive into the hardware specifics a bit more.

 

FME Flow Engine

The biggest usage of system resources is FME Flow Engines. If the engines are going to be distributed on a different system from the Core then most of the considerations for sizing will surround the types of workspaces and what resources they use. For the FME Flow Core and FME Flow Application Server, these can function well enough with 2 GB of RAM and 1 core or 2 cores (a single core for each is better FME Flow Core & FME Flow Web Application).

You should also carefully consider the other system processes and what memory and CPU resources they will require to keep the system running smoothly. This is especially important when using Engines on the Core system (for example like in an Express Install, and Distributed/Fault-Tolerant Installs where the Core and Engines will be on the same system). 

But the amount of system resources an engine uses to run a workspace depends greatly on workspace design. 
 

Questions

  • Are there any optimal hardware setups that would affect the speed at which the FME Flow Engines performs? 
  • Since we will be running some fairly processing-intensive scripts is there a suggested setup? 

 

Answers

When determining the optimal hardware setup, it is best to do some research with your current setup and workspaces to see how you are utilizing resources. The best place to start is by reviewing FME Form (formerly FME Desktop) workspace logs. 
 
At the bottom of the workspace logs, you'll see a couple of lines

INFORM|FME Session Duration: 9 minutes 9.9 seconds. (CPU: 484.1s user, 4.0s system)
INFORM|END - ProcessID: 14356, peak process memory usage: 1164584 kB, current process memory usage: 857028 kB

 
Note: The specific log information in this article is unique to the system that this article was written with. For example, if you only have a 1CPU (4Core) system with 8GB of RAM and 20GB of disk space the FME Engine will operate within those limits as best it can.  Obviously, running the same workspace on a system with more RAM, the log will likely report different resource usages. 

 

Breaking down each of the Resource Usages Parameters

CPU

A lot of 'user' equals a lot of FME.exe processing. The 'system' can be related to I/O tasks.  So if you see a lot of 'system' you'll want to ensure you have efficient disk speed and good network bandwidth for reading/writing files from the network.  There are two reasons for I/O; network, local disk read/writes, or caching. If FME runs out of memory it will start to cache data.  The more memory available, the easier it is to avoid this situation. Workspaces that take data in and do group-bys or need to process on a dataset against another entire dataset can take up a lot of memory or if memory is limited; caching will occur. 
 
If you see a lot of your workspaces taking a lot of system time it might also indicate I/O from reading and writing data. Investigate if it is necessary to read/write files from their current location and if the proximity of FME could be closer to the data to help with this.  
 
If you have a lot of user CPU then this is all from the workspace design and transformers. 

  • Is there one workspace that behaves worse than others? 
  • Can it be improved (how-to link)? 
  • Review the workspace? 
  • Update Transformers/Readers/Writers?  
  • Let Safe Software know about the high CPU usage and see if there is another way to do the work in FME differently.  There may not be another way.  But if you think the CPU usage is high and not valid then good to question that.  

If you are doing a lot of data uploads, downloads, streaming services with FME Flow then also consider where the FME Engine will be writing the output files, that the FME Flow System Share is located as close to both the FME Flow Web Application (usually on the same system as the FME Flow Core - and recommended) and the FME Flow Engine. The engine writes the data out, the Web Application can then share it via Data Download or Data Streaming services.  If you are doing mostly scheduled jobs and not using the WebUI or REST API, then this can be ignored.

 

Memory

Keep an eye on the peak process memory usage for all workspaces, this will give you a good indication of what an FME Engine might want to use, but again, say if you have a 32 GB RAM system and 2 engines. If the system is utilizing 4GB without FME engines running, then if you know you have 2 workspaces that could run at the same time and you know they like to use ~30GB of RAM max each, then you are looking at needing a minimum of 64GB RAM for the system to avoid caching. If FME can keep everything in Memory and not cache data then you'll have much faster processing in general.  
 
Note: FME Engines are single-threaded. No advantage to blowing the roof off for a whole bunch of CPU/cores if you are using 2 engines for example.  Better to invest in Memory specs, and look at what you can do for Disk Performance.  
 
In summary, consider a healthy amount of memory.  This is workspace-dependent, each workspace is unique in its resource usage footprint, and it is so hard to recommend a formula that would work. 
 
An example would be:

Peak System Memory Requirement = (MaxEngines + MaxDynamicEngines + MaxChildEngines) * average workspace peak process memory usage + max system memory usage + 3rd party software peak memory usage
 

MaxEngines Total Licensed Engines running at any given time?
MaxDynamicEngines.    Total Dynamic Engines running at any given time?
MaxChildEngines Do your workspaces launch child jobs?
Average workspace peak process memory usage This relates to the average peak process memory usage reported in the job log for each workspace job log.  Finding this information involves reviewing the Job Logs for individual workspaces.
System Memory What is the maximum expected memory usage of the OS with no other software running?
3rd party software What other applications are running or being used by FME Server and what impact on Memory might they have?


For disk space, if you know FME uses a lot of temp space (caching), or you write out temporary files on purpose (FeatureWriter for example, or interact with Python, SystemCaller, and create local files via 3rd party tools), then you'll be in the same place attempting to determine the usage of disk space.  
 
For a formula (see below), something like this might get you close to a maximum disk usage at any given time when all engines are running the most disk space intensive workspaces.  Again, if memory is below what a workspace requires, possible temporary storage (caching) will also be happening and this will only be seen while the workspace is running and by monitoring free space on the system during the workspace run.    

Peak Disk Space Requirement = (MaxEngines + MaxDynamicEngines + MaxChildEngines) * cached/temp file space usage per engine/workspace combo + max system disk space usage + 3rd party software disk space needs
 

cached/temp    During the workspace run, the engines may use temporary disk storage. This is typically cleared when the job is completed or when the engine restarts. If available memory is not sufficient, this will also trigger caching.
system What disk space does the OS software require? What future disk space might be required for OS Updates? 
3rd party software What disk space does the other software on the computer require?


Keep in mind the above two formulas are an example of how to size a system, and this would be the maximum needed for both Memory and Disk Space and would never be required starting out.
 
However, these will allow for the perfect storm of jobs to run where the jobs collide to use maximum system resources.  This scenario doesn’t have to playout, as you can control this by scheduling jobs appropriately.
 

Recommendations

CPU

Lastly, how many CPUs do you need?  We recommend 1 core per FME engine. For example, you could get away with 1 CPU with 4 cores, for a 2 FME Flow Engine system.  If you had 10 engines, you are probably looking more at 3 CPUs with 4 cores on each. You get the idea.
 

Engines

The Engine count is the easiest part of the FME Flow Environment to change.  Simply purchase additional engine licenses to increment.  The question when first getting into FME Flow often becomes, how many engines will be required?  This again will come down to the necessary throughput you require. If you are scheduling jobs these can be spaced and maybe one engine is enough.  If you have long-running jobs and other jobs that are time-sensitive you may require additional engines.  With FME Flow’s Job Queues workspaces can be assigned to different engines, separating slow jobs from time-sensitive or quick jobs, or just assigning certain jobs to certain engines.  
 

Final Thoughts

Ideally, if hardware provisions permit, and you expect the demands on the FME Flow Engines will be heavy, I would consider distributing the FME Engines onto their own system where you have the minimum software installed. Just what you need, like ArcGIS software for example. Now you can enhance this system to meet the resource demands with little impact from other influences. This would be the “heavy lifting” system. The FME Flow Core system is pretty lightweight and the Java processes are all limited to about 1GB RAM each and rarely get to this level so pretty safe with 1GB-4GB for that part of FME Flow.    
 
Assuming you are looking for Virtual Machine specs vs. a physical system for a rack. However, the VM environment will give you freedom (depending on the VM host) to modify available system resources.
 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.