Tasks should terminate in case of JobScheduler Master crash




      Current Situation

      • A shell job is running on a JobScheduler Master.
      • JobScheduler Master crashes (for instance with a kill -11 SIGSEGV signal) or is killed (for instance with a kill -9 SIGKILL signal).
      • The task continues and will be completed even though the JobScheduler Master is not available anymore.
      • This behaviour does not apply to jobs that make frequent use of the JobScheduler API.

      Desired Behavior

      • All tasks (including the ones for shell jobs) are terminated immediately in case of a JobScheduler Master crash.


      • The Master keeps track of running tasks with an internal process list.
      • In case of a Master crash (segmentation fault) the Master will terminate any running tasks from that list.


      • This feature is intended to cope with a situation when a SIGSEGV signal is sent to the Master, for instance via kill -11, i.e. in case of a crash (a segmentation fault).
      • This feature is not intended to cope with a situation when a SIGKILL signal is sent to the Master, for instance via kill -9 as this does not represent a realististic operational situation.

      Maintainer Notes

      • This feature proposal responds to a theoretical problem that has not yet been reported as an issue. We tend to move the JobScheduler architecture in a direction that will make more use of Agents and integrate Agents more thoroughly.
      • Therefore we added JS-1550 for tasks with Agents and we do not have the intention to add this feature to the Master.
      • In fact as of today you can run an Agent on the Master server should you have any concerns that a Master could become unavailable while a task is running on the server of the Master.


