Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-1524

Universal Agent supports reconciliation after connection loss

    XMLWordPrintable

Details

    Description

      Feature

      • The Universal Agent assumes a short-term loss of its connection to the Master if two heartbeats are missing (see Heartbeat Implementation).
        • Should a HTTP POST request by the Master not be received and acknowledged by the Agent then the Master will re-send the request up to five times.
        • The Agent will handle possible duplicate requests from the Master and will acknowledge within 5s.
      • If the attempts of the Master to establish the connection and to re-send the requests for a maximum number of five times
        • are successful then this is considered a recoverable connection loss.
        • are unsuccessful then this is identified as an unrecoverable error.
      • In case of a recoverable connection loss
        • the tasks are continued and completed with the Agent.
        • the Agent stores log output of tasks in local files (see JS-1521).
        • the Agent reports the log information of running and completed tasks back to the Master.
        • the Agent reports the execution history of running and completed tasks back to the Master.
        • the Master adds the information received from the re-connected Agent to its history.
        • The Master will report running tasks of an Agent after re-connect.
      • In case of an unrecoverable error of the connection the Agent will kill the task (JS-1523)

      Heartbeat Implementation

      • The Master and Agent send heartbeats to each other.
        • The Agent receives HTTP POST requests from the Master and will respond within 5s, independently from the completion of the command that has been requested by the Master.
        • The Master will repeat sending further HTTP POST requests and accepting acknowledgements until the Agent sends the final response, i.e. after completion of a task.
      • If the Agent does not receive a heartbeat from the Master within the double period (10s) then the Agent will assume the connection to be lost and will kill the task.
      • If the Master does not receive a heartbeat from the Agent then the Master will consider the task being lost and will assign the task an error state.

      Delimitation

      • This feature covers the situation of a recoverable Network Connection Loss, not of an on-going network outage
      • This feature does not cover the situation of an unrecoverable connection loss that is due to failure or restart of a Master (server).

      Attachments

        Issue Links

          Activity

            People

              jz Joacim Zschimmer
              ap Andreas PĆ¼schel
              Victor Garcia-Beltran Victor Garcia-Beltran (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: