Details
-
Feature
-
Status: Dismissed (View Workflow)
-
Minor
-
Resolution: Won't Fix
-
1.9, 1.10
Description
Current Situation
- A shell job is running on a JobScheduler Master.
- JobScheduler Master crashes (for instance with a kill -11 SIGSEGV signal) or is killed (for instance with a kill -9 SIGKILL signal).
- The task continues and will be completed even though the JobScheduler Master is not available anymore.
- This behaviour does not apply to jobs that make frequent use of the JobScheduler API.
Desired Behavior
- All tasks (including the ones for shell jobs) are terminated immediately in case of a JobScheduler Master crash.
Implementation
- The Master keeps track of running tasks with an internal process list.
- In case of a Master crash (segmentation fault) the Master will terminate any running tasks from that list.
Delimitation
- This feature is intended to cope with a situation when a SIGSEGV signal is sent to the Master, for instance via kill -11, i.e. in case of a crash (a segmentation fault).
- This feature is not intended to cope with a situation when a SIGKILL signal is sent to the Master, for instance via kill -9 as this does not represent a realististic operational situation.
Maintainer Notes
- This feature proposal responds to a theoretical problem that has not yet been reported as an issue. We tend to move the JobScheduler architecture in a direction that will make more use of Agents and integrate Agents more thoroughly.
- Therefore we added
JS-1550for tasks with Agents and we do not have the intention to add this feature to the Master. - In fact as of today you can run an Agent on the Master server should you have any concerns that a Master could become unavailable while a task is running on the server of the Master.