Known Issue: FME Server Engine continuously restarts with ERROR 'Missing or expired workspace chaining context'

Liz Sanderson
Liz Sanderson
  • Updated
Known Issue ID FMEFLOW-17099
Discovered 2020.1
Affects Versions < 2022.0
Resolved 2022.0

Symptom

Your workspace contains an FMEServerJobSubmitter with Submission Mode = In Sequence and Wait for Jobs to Complete = Yes. Running a job with this combination launches an additional (child) engine process.

The child job may or may not be completed successfully, but in the fmeserver.log there is now continuous logging referencing a missing or expired workspace chaining context, e.g. 

INFORM   RequestHandler-Thread   401937 : Accepted new FME Engine connection.
INFORM   RequestHandler-Thread   401950 : Registering FME Engine...
ERROR    RequestHandler-Thread   401980 : Missing or expired workspace chaining context `context-id-873eac3c-1c81-4fc7-ab4d-896fd9a07412`. 
ERROR    RequestHandler-Thread   401939 : Failed to register FME Engine.


Before the translation is completed you may also see an error for the child engine reported in the fmeprocessmonitorengine.log, e.g.

Could not read from socket after a period of time; connection may have been lost

 

Cause

When a child engine is launched, a workspace chaining context is started. The context associates the child engine process with the parent engine process.

When the parent job is completed, FME Server usually ends the workspace chaining context and permanently shuts down the child engine. However, in this scenario, the child engine unexpectedly disconnects from the parent and cannot recreate that association. The child engine process is now orphaned.

There are two known reasons this can occur: 

  1. The child or parent job crashed mid-translation. 
  2. The job timeout setting is incorrect. If this is the case, you will see a "Could not read from socket after a period of time" error present in the fmeprocessmonitorengine.log. When the timeout setting is incorrect, the child engine shuts down before the parent job can complete. Due to the premature shutdown, the context link is broken and the child engine will endlessly restart. In this scenario, the RECEIVE_TIMEOUT parameter has likely been configured in FMEEngineConfig.txt. 

 

Resolution

1. Restart FME Server to stop excessive logging in the fmeserver.log. This will stop all engine processes. Upon startup, the child engine will no longer exist.  

2. Next steps depend on what is causing the root issue. 

If you believe the cause is due to a child job/engine crashing (reason 1 above), turn off Job Recovery. This will stop the job from being re-submitted. However, ultimately you'll need to investigate why the job is crashing and resolve this at the root. 

If you believe you've run into the timeout error (reason 2 above), you should update the RECEIVE_TIMEOUT configuration:

  1. Run a Text Editor as Administrator and open FMEEngineConfig.txt located in <InstallDir>\Server\
  2. Locate the RECEIVE_TIMEOUT and set this value to zero. This is the default value and means FME Engine will not shut down when it doesn't receive any translation requests. Note: If you need to set this parameter to a value other than zero for some reason, it must be set to a value that is longer than the expected duration of any workspaces containing FMEServerJobSubmitter transformers (this timeout is measured in milliseconds). 
  3. Restart FME Server to apply these changes.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.