Known Issue ID | FMEFLOW-17099 |
---|---|
Discovered | 2020.1 |
Affects | Versions < 2022.0 |
Resolved | 2022.0 |
Symptom
Your workspace contains an FMEServerJobSubmitter with Submission Mode = In Sequence and Wait for Jobs to Complete = Yes. Running a job with this combination launches an additional (child) engine process.
The child job may or may not be completed successfully, but in the fmeserver.log there is now continuous logging referencing a missing or expired workspace chaining context, e.g.
INFORM RequestHandler-Thread 401937 : Accepted new FME Engine connection. INFORM RequestHandler-Thread 401950 : Registering FME Engine... ERROR RequestHandler-Thread 401980 : Missing or expired workspace chaining context `context-id-873eac3c-1c81-4fc7-ab4d-896fd9a07412`. ERROR RequestHandler-Thread 401939 : Failed to register FME Engine.
Before the translation is completed you may also see an error for the child engine reported in the fmeprocessmonitorengine.log, e.g.
Could not read from socket after a period of time; connection may have been lost
Cause
When a child engine is launched, a workspace chaining context is started. The context associates the child engine process with the parent engine process.
When the parent job is completed, FME Server usually ends the workspace chaining context and permanently shuts down the child engine. However, in this scenario, the child engine unexpectedly disconnects from the parent and cannot recreate that association. The child engine process is now orphaned.
There are two known reasons this can occur:
- The child or parent job crashed mid-translation.
- The job timeout setting is incorrect. If this is the case, you will see a "Could not read from socket after a period of time" error present in the fmeprocessmonitorengine.log. When the timeout setting is incorrect, the child engine shuts down before the parent job can complete. Due to the premature shutdown, the context link is broken and the child engine will endlessly restart. In this scenario, the RECEIVE_TIMEOUT parameter has likely been configured in FMEEngineConfig.txt.
Resolution
1. Restart FME Server to stop excessive logging in the fmeserver.log. This will stop all engine processes. Upon startup, the child engine will no longer exist.
2. Next steps depend on what is causing the root issue.
If you believe the cause is due to a child job/engine crashing (reason 1 above), turn off Job Recovery. This will stop the job from being re-submitted. However, ultimately you'll need to investigate why the job is crashing and resolve this at the root.
If you believe you've run into the timeout error (reason 2 above), you should update the RECEIVE_TIMEOUT configuration:
- Run a Text Editor as Administrator and open FMEEngineConfig.txt located in <InstallDir>\Server\
- Locate the RECEIVE_TIMEOUT and set this value to zero. This is the default value and means FME Engine will not shut down when it doesn't receive any translation requests. Note: If you need to set this parameter to a value other than zero for some reason, it must be set to a value that is longer than the expected duration of any workspaces containing FMEServerJobSubmitter transformers (this timeout is measured in milliseconds).
- Restart FME Server to apply these changes.
Comments
0 comments
Please sign in to leave a comment.