FME Version
What are Automated Retries?
Automated retries is a feature in FME Flow automations that allow workspace actions and External actions to be automatically retried in the case of a failure. For workspace actions, jobs that fail due to a translation error or engine problem will follow the automated retry configuration and will be resubmitted to the job queue with a new job ID, until the job either succeeds or fails every retry attempt. For external actions, failures to connect to the external service or complete the action will cause a retry until the process succeeds or fails every retry attempt.
This new feature is part of our ongoing work to bring a complete Enterprise Integration Solution to fruition and is an initial step to offering ‘Guaranteed Delivery’, a key Enterprise Integration Pattern. For more information on Enterprise Integration Patterns in the FME Platform, visit our article on Getting Started with Enterprise Integration Patterns.
Why Should I Use Automated Retries?
Automated retries enable FME Flow authors and admins to “guarantee” that an action will occur, at least inside the scope of intermittent network or connectivity issues and component failure. This will eliminate the manual oversight and intervention currently required for many kinds of job and external action failures that are intermittent in nature. For workspace actions or external actions in FME Flow automations that have a chance of failing due to external server-side errors, network instability, or timeouts, this is a feature that will add robustness and extra automation to your workflows.
What are Some Use Cases for Automated Retries?
Email External Actions
A typical FME Flow automation might end by sending an email notification to stakeholders or the FME Flow admin with the results of the workflow. Configuring automated retries on this external action can cover potential network or email server hiccups that prevent the email from reaching its destination despite the data translation itself completing successfully.
Workspace Actions with Web Service Integrations
Workspaces that reach out to external web services or web APIs, whether through source data URLs, HTTPCallers, or web-based formats (like ArcGIS Online) to name a few, are at the mercy of the external web service and can fail intermittently due to server-side issues, network instability, or timeouts. With automated retries set up on the workspace action, the FME Flow automation will automatically try to overcome these occasional issues by re-submitting the job. The advantage here is two-fold: either the job is retried and succeeds, or the job eventually fails and indicates that there is a prevalent workflow issue that needs to be investigated.
Workspace Actions with Database Integrations
Workspaces that interact with databases can be prone to the same issues as workspaces that integrate with web services, with the added complexity of database session limits, processing loads, and fine-grained timeouts. Automated retries help push past these intermittent issues by re-submitting the job, which may succeed on a repeat attempt or uncover a workflow issue if it eventually fails.
Where Can I Find Automated Retries in FME Flow Automations?
Automated retries can be found only in FME Flow automations for workspace actions and external actions in FME 2021 or newer. In the action parameters, there is a Retry tab where you can turn the feature on and off as well as set how the retries will behave.
How Do I Configure Automated Retries?
Default Settings
In the Retry tab, enabling the Retry on failure checkbox will turn on the default retry settings. You’ll also notice that the icon for the action’s failure port changes to indicate that retries have been set.
For workspace actions, the default is to retry the action 5 times in the event of a failure (Number of attempts), with each retry being submitted immediately after a failure (Wait between attempts).
For external actions, the default is to retry the action 5 times in the event of a failure (Number of attempts). The interval between the failure and the first retry is 5 seconds (Wait between attempts), but the interval is doubled with the Backoff multiplier for each subsequent retry. The Randomization factor gives us a 25% range based on the interval, meaning that the second retry interval of 10 seconds will be randomized to between 7.5 and 12.5 seconds, the third retry interval of 20 seconds will be randomized to between 15 and 25 seconds, and so on until the final retry. Since the Maximum wait between attempts has been capped at 1 minute, the interval between retries will never exceed 1 minute.
Default Retry Settings for External Action
Custom Settings
In the Retry tab, enable the Use custom retry settings checkbox will turn on retries and allow you to customize each parameter of the configuration. This allows you to configure the best retry behavior for the given workflow or particular action, especially in cases where you may want FME Flow to wait longer before retrying the action. For example, on the custom retry configuration below, the first retry of the workspace action will occur after 10 seconds, but the Backoff multiplier will cause subsequent retries to occur at 100 seconds, then ~17 minutes, then ~2 hours and 45 minutes, and finally capping at the Maximum wait between attempts of 3 hours for all further retries. Spreading out the retries in this way could help guarantee the success of a job that connects to a database or web service that experiences heavy loads at certain times of the day.
Automated Retry Behavior
When a workspace action that is configured for automated retries fails, information about the retry will be visible in the Automation Log. If Wait between attempts is greater than 0, the time to resubmit is logged, and the resubmittal number is always logged.
If all FME Flow engines are busy processing jobs when a retry occurs, the retried job will be submitted to the job queue. Jobs that are resubmitted by retries always get a new job ID and a new job log.
When an external action fails, information about the retry and time interval, if greater than 0, is logged in the Automation Log.
Comments
0 comments
Please sign in to leave a comment.