Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-1954

Enable Agents to execute jobs with a number of Subagents

    XMLWordPrintable

Details

    Description

      Current Situation

      • Agents execute jobs by running child processes.
      • Agents scale vertically, i.e. any number of child processes can be executed (testing by SOS stops at some 15 000 parallel processes per Agent).
      • Agents do not support clustering, e.g. for applications that should be executed from parallel jobs running on different servers.

      Desired Feature

      • Agent Components
        • The Agent is considered to consist of a number of components:
          • the Agent Director that holds workflows and orders state transitions in its journal,
          • the Subagent that executes jobs on behalf of the Agent Director.
        • Agent Director
          • knows orders and workflows and handles workflow instructions that are within the scope of a single Agent (Fork, Retry etc.),
          • is contacted by the Controller, requests jobs to be executed by Subagents and reports back execution results to the Controller.
        • Subagent
          • is operated with an Agent Director and can be operated standalone on any number of servers,
          • does not have a memory of jobs but immediately reports back to an Agent Director the execution results and log output,
          • is used for horizontal scaling of jobs running for applications that should be executed from a number of servers in parallel,
          • can be enabled and disabled by commands that are forwarded from a Controller to an Agent Director.
      • Installation
        • The Agent Director ships with a Subagent. This corresponds to the known behavior of Agents to execute jobs from a single Agent installation.
        • Subagents can be installed as standalone instances, they are lightweight and do not hold a journal.
      • Cluster
        • Director Cluster
          • The Director Cluster rules fail-over and switch-over between an active Agent Director and a standby Agent Director.
          • The active Agent Director synchronizes its journal file to the passive Agent Director. With journals being in sync a fail-over or switch-over can apply.
          • The active Agent Director connects to the Subagents to request execution of jobs and to receive job execution results.
        • Subagent Cluster
          • The Subagent Cluster is a logical view on a selection of Subagents that are operated for a scheduling mode:
            • fixed-priority: always the first Subagent is used and in case of unavailability the next Subagent is used
            • round-robin: each next job is executed with the next Subagent
            • load: execute a job on the Subagent with least CPU load and memory load on its server.
          • Any number of Subagent Clusters can be configured with the same Subagent being a member in more than one Subagent cluster.
          • If the connection between an Agent Director and the Subagent is permanently lost then
            • the Subagent will kill running processes of a job after a configurable timeout to re-establish the connection.
            • the Agent Director will assign the job for execution with the next available Subagent.
      • Scope
        • Availability of the Subagent Cluster with limited scheduling modes is in scope of this issue.
        • Availability of the Director Cluster including full support of scheduling modes is in scope of JS-1955.

      Attachments

        Issue Links

          Activity

            People

              jz Joacim Zschimmer
              ap Andreas Püschel
              Kanika Agrawal Kanika Agrawal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: