Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-2065

Bug: Cluster blocks when restarting after active nodes's crash and change of ClusterWatch

    XMLWordPrintable

Details

    • Fix
    • Status: Released (View Workflow)
    • Minor
    • Resolution: Fixed
    • 2.5.3
    • 2.5.4, 2.6.0
    • None
    • None

    Description

      Cluster recovery may block when
       * After active cluster node has been killed (or crashed), and
       * the ClusterWatch changes before recovery.

      The active node's log file shows warnings like this: 

      WARN  js7.journal.JournalActor - Waiting for 12:29min for acknowledgement from passive cluster node⏎
       for 2 events (in 2 persists), last is Stamped(... ClusterWatchRegistered(ClusterWatch:joc#0))
      

      The passive node's log file shows warnings like this:

      WARN  js7.cluster.ClusterCommon - ClusterRecouple command failed with HTTP 503 Service Unavailable:⏎
       POST https://controller-2-0-primary:4443/controller/api/cluster/command => ⏎
       ClusterNodeIsNotActive: This cluster node is not (yet) active
      

      Attachments

        Activity

          People

            jz Joacim Zschimmer
            jz Joacim Zschimmer
            Andreas Püschel Andreas Püschel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: