Details
-
Feature
-
Status: Released (View Workflow)
-
Minor
-
Resolution: Fixed
-
None
Description
Feature
A common problem with ssh sessions is, whenever an error occurs on the remote system which causes the process to hang, no exit code is send to the host which started the connection.
The improvement has to make sure, that a process started on the remote machine can be monitored and terminated if needed (monitor and cleaner).
The monitor has to recognize the following issues:
- the process stops working without returning an exit code (process hangs)
- the session to the remote machine is lost, but the process is still running on the remote host
To be able to react on those issues additional information has to be available:
- the monitor has to know the pid of the process to be able to find that process after an error occurred
- a timeout has to be configured, to make sure, that the monitor becomes active after the configured timespan
Parameterization
New parameters for the SSH Job:
Name: runWithWatchdog
Type: SOSOptionBoolean
Description: this parameter determines if the SSH Job uses the session management
Default: false
Name: cleanupJobchain
Type: SOSOptionString
Description: this parameter determines the job chain used for the clean up work
Default: ""
Name: ssh_job_get_pid_command
Type: SOSOptionString
Description: the command to get the pid of the active shell on the remote host. The command has to write its result to stdout on the remote host.
Default: echo $$
Name: ssh_job_get_active_processes_command
Type: SOSOptionString
Description: The command to check if the given process is still running. The placeholders ${user} and ${pid} are not mandatory but can be a part of the command. The job checks if the placeholders are present and if so substitutes them with the given values. The command receives a list of processes filtered by the given pid and the username who started the process. If the command on the remote host ends with exit code = 0 then the process is still running. If the commands ends with exit code != 0 the process is not available anymore. The job recognizes the exit code of the command and processes accordingly.
Default: /bin/ps -ef | grep ${pid} | grep ${user} | grep -v grep
Name: ssh_job_kill_pid_command
Type: SOSOptionString
Description: The command to kill a remote running pid
Default: kill -9
Name: ssh_job_terminate_pid_command
Type: SOSOptionString
Description: The command to terminate a remote running pid
Default: kill -15
Name: ssh_job_get_child_processes_command
Type: SOSOptionString
Description: The command or script determines the child processes of the given pid
Default: ps -ef | pgrep -P{pid}
Attachments
Issue Links
- is related to
-
JITL-124 SSH Job handles event jobscheduler_on_event(kill)
- Dismissed
-
JITL-180 SSH Session management should work for stand alone SSH Jobs
- Released
-
JITL-181 SSH Session management should also kill child processes
- Released
- requires
-
JITL-123 JITL SSH Job should switch from Trilead to JSch implementation by JCraft
- Released
- mentioned in
-
Page Loading...