Details
-
Feature
-
Status: Released (View Workflow)
-
Medium
-
Resolution: Fixed
-
None
-
None
Description
Exempt jobs from restart after Subagent loss{}
Current Situation
Consider an Agent Cluster executing jobs with Subagents:
- If the Subagent is crashed while a job is running then the related order is set to the blocked state which indicates that the order state is unknown to the Director Agent.
- If the Subagent ist restarted then the order will continue and jobs in execution at the point in time of crash will be restarted. Jobs will similarly be restarted if the Subagent is not restarted but loss of the Subagent is confirmed by the user, see
JS-2141. - While many jobs might be restartable, some might not. A Director Agent cannot prove that the Subagent is not running, for example in case of network issues the Subagent might be unreachable but continue to run jobs. Restartable jobs must be prepared to the situation that more than one instance of a job is running at the same time.
Desired Behavior
- Users can qualify a job being non-restartable from the workflow configuration (JSON).
- In case of restart of the crashed Subagent or if loss of the Subagent is confirmed then jobs being qualified as non-restartable will not be started. The releated order will be put to the failed state.