Details
-
Feature
-
Status: Released (View Workflow)
-
Major
-
Resolution: Fixed
-
2.0.0
Description
Current Situation
- Agents execute jobs by running child processes.
- Agents scale vertically, i.e. any number of child processes can be executed (testing by SOS stops at some 15 000 parallel processes per Agent).
- Agents do not support clustering, e.g. for applications that should be executed from parallel jobs running on different servers.
Desired Feature
- Agent Components
- The Agent is considered to consist of a number of components:
- the Agent Director that holds workflows and orders state transitions in its journal,
- the Subagent that executes jobs on behalf of the Agent Director.
- Agent Director
- knows orders and workflows and handles workflow instructions that are within the scope of a single Agent (Fork, Retry etc.),
- is contacted by the Controller, requests jobs to be executed by Subagents and reports back execution results to the Controller.
- Subagent
- is operated with an Agent Director and can be operated standalone on any number of servers,
- does not have a memory of jobs but immediately reports back to an Agent Director the execution results and log output,
- is used for horizontal scaling of jobs running for applications that should be executed from a number of servers in parallel,
- can be enabled and disabled by commands that are forwarded from a Controller to an Agent Director.
- The Agent is considered to consist of a number of components:
- Installation
- The Agent Director ships with a Subagent. This corresponds to the known behavior of Agents to execute jobs from a single Agent installation.
- Subagents can be installed as standalone instances, they are lightweight and do not hold a journal.
- Cluster
- Director Cluster
- The Director Cluster rules fail-over and switch-over between an active Agent Director and a standby Agent Director.
- The active Agent Director synchronizes its journal file to the passive Agent Director. With journals being in sync a fail-over or switch-over can apply.
- The active Agent Director connects to the Subagents to request execution of jobs and to receive job execution results.
- Subagent Cluster
- The Subagent Cluster is a logical view on a selection of Subagents that are operated for a scheduling mode:
- fixed-priority: always the first Subagent is used and in case of unavailability the next Subagent is used
- round-robin: each next job is executed with the next Subagent
- load: execute a job on the Subagent with least CPU load and memory load on its server.
- Any number of Subagent Clusters can be configured with the same Subagent being a member in more than one Subagent cluster.
- If the connection between an Agent Director and the Subagent is permanently lost then
- the Subagent will kill running processes of a job after a configurable timeout to re-establish the connection.
- the Agent Director will assign the job for execution with the next available Subagent.
- The Subagent Cluster is a logical view on a selection of Subagents that are operated for a scheduling mode:
- Director Cluster
- Scope
- Availability of the Subagent Cluster with limited scheduling modes is in scope of this issue.
- Availability of the Director Cluster including full support of scheduling modes is in scope of
JS-1955.