Uploaded image for project: 'JITL - JobScheduler Integrated Template Library'
  1. JITL - JobScheduler Integrated Template Library
  2. JITL-147

Improve SSH Session management to monitor and kill remote sessions

    XMLWordPrintable

Details

    • Feature
    • Status: Released (View Workflow)
    • Minor
    • Resolution: Fixed
    • None
    • 1.9

    Description

      Feature
      A common problem with ssh sessions is, whenever an error occurs on the remote system which causes the process to hang, no exit code is send to the host which started the connection.

      The improvement has to make sure, that a process started on the remote machine can be monitored and terminated if needed (monitor and cleaner).

      The monitor has to recognize the following issues:

      • the process stops working without returning an exit code (process hangs)
      • the session to the remote machine is lost, but the process is still running on the remote host

      To be able to react on those issues additional information has to be available:

      • the monitor has to know the pid of the process to be able to find that process after an error occurred
      • a timeout has to be configured, to make sure, that the monitor becomes active after the configured timespan

      Parameterization
      New parameters for the SSH Job:

      Name: runWithWatchdog
      Type: SOSOptionBoolean
      Description: this parameter determines if the SSH Job uses the session management
      Default: false

      Name: cleanupJobchain
      Type: SOSOptionString
      Description: this parameter determines the job chain used for the clean up work
      Default: ""

      Name: ssh_job_get_pid_command
      Type: SOSOptionString
      Description: the command to get the pid of the active shell on the remote host. The command has to write its result to stdout on the remote host.
      Default: echo $$

      Name: ssh_job_get_active_processes_command
      Type: SOSOptionString
      Description: The command to check if the given process is still running. The placeholders ${user} and ${pid} are not mandatory but can be a part of the command. The job checks if the placeholders are present and if so substitutes them with the given values. The command receives a list of processes filtered by the given pid and the username who started the process. If the command on the remote host ends with exit code = 0 then the process is still running. If the commands ends with exit code != 0 the process is not available anymore. The job recognizes the exit code of the command and processes accordingly.
      Default: /bin/ps -ef | grep ${pid} | grep ${user} | grep -v grep

      Name: ssh_job_kill_pid_command
      Type: SOSOptionString
      Description: The command to kill a remote running pid
      Default: kill -9

      Name: ssh_job_terminate_pid_command
      Type: SOSOptionString
      Description: The command to terminate a remote running pid
      Default: kill -15

      Name: ssh_job_get_child_processes_command
      Type: SOSOptionString
      Description: The command or script determines the child processes of the given pid
      Default: ps -ef | pgrep -P{pid}

      Attachments

        Issue Links

          Activity

            People

              sp Santiago Aucejo Petzoldt
              sp Santiago Aucejo Petzoldt
              Uwe Risse Uwe Risse
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: