Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-2141

Confirm loss of Subagent to restart jobs

    XMLWordPrintable

Details

    Description

      Current Situation
      Consider an Agent Cluster executing jobs with Subagents:

      • If the Subagent is crashed before a job is about to start then the active Director Agent will select the next available Subagent to execute the job.
      • If the Subagent is crashed while a job is running then the related order is put to the blocked state. No operations are available on the order. When the Subagent is restarted, then the job will be restarted.

      Desired Behavior

      • The behavior is correct to restart jobs in case of restart of a crashed Subagent. In addition, users would like to restart jobs from a different Subagent if a crashed Subagent is not restarted, for example in a situation when the server is out of order.
      • A Director Agent cannot know if the Subagent is not running or if the Subagent is not accessible but is still executing the job, for example in case of network issues. It can result in double job execution if the Director Agent would automatically restart such jobs.
      • Users who wish to restart jobs from a different Subagent, can confirm loss of the crashed Subagent (Controller command). The next Subagent will be selected based on the Subagent Cluster configuration.
        • The command is sent from JOC Cockpit to the Controller.
        • The Controller forwards the command to the Director Agent that must be reachable to the Controller.

      Maintainer Note

      • The functionality is available from the "Reset" operation that is offered from the "Manage Controllers/Agents" page. The "Reset" operation for a crashed Subagent causes crashed jobs to be restarted from the next Subagent in the given Subagent Cluster unless the job is marked being non-restartable.

      Attachments

        Activity

          People

            jz Joacim Zschimmer
            ap Andreas PĆ¼schel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: