Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-1340

Enable Backup JobScheduler to pick up operations after graceful shutdown of a Primary JobScheduler instance

    XMLWordPrintable

Details

    • Feature
    • Status: Deferred (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Current Situation

      • When running a Primary and secondary Backup JobScheduler, if the Primary service gets shutdown gracefully (equal to the command Terminate), the Backup service will not automaticaly start.
      • This is well documented in the JobSchedulers Documentation.

      Desired Behavior

      • For system maintenance needs, some users would want for the secondary Backup JobScheduler to pick up regardless if the Primary JobScheduler is shutdown gracefully or not.
      • The scenario includes the secondary Backup JobScheduler to wait for some manual intervention before picking up operation. No restart of the Backup JobScheduler should be required. Instead an option should be available with JOC to force the Backup JobScheduler to change its role to a Primary JobScheduler.
      • Use Case
        • Server A with the Primary JobScheduler is taken into maintenance. The Primary JobScheduler is shutdown normally.
        • Server B with the Backup JobScheduler is in stand-by mode, i.e. the Backup JobScheduler is up and running, but does not pick up operation.
        • At the end of the maintenance period Server A with the Primary JobScheduler that should resume operation is not functional for some reason. In this case the Backup JobScheduler on Server B should be manually triggered to pick up operation immediately without the need of a restart.

      Workaround

      • There is no immediate solution available for the handling of the JobScheduler Windows services. However, you can apply the following sequence of actions:
        • Install a new Windows Service on Server B with the start script parameterization of the Primary JobScheduler from Server A.
          • Server B now has two Windows Services: one for starting JobScheduler in the role of the Backup JobScheduler and one for starting it as a Primary JobScheduler.
          • Both Windows Services on Server B are intended for alternate use and by no means should be operated in parallel.
        • Terminate the Primary JobScheduler on Server A by terminating the Windows Service or by the corresponding operation in JOC.
          • JobScheduler would wait until all current tasks are terminated.
          • No tasks will be added after this command has been executed.
        • After the Primary JobScheduler has terminated the Backup JobScheduler will not pick up operation as this is not a fail-over situation.
        • At this point you could perform your maintenance work for the Primary JobScheduler on Server A.
      • To resume operation with the Backup JobScheduler after the Primary JobScheduler does not come up follow these steps:
        • Terminate the Backup JobScheduler Windows Service.
        • Start the recently configured JobScheduler Windows Service on Server B that runs the Backup JobScheduler in the role of a Primary instance.
      • To switch back from operation on Server B to Server A apply the following steps:
        • Stop the recently configured JobScheduler Windows Service on Server B.
        • Start the Primary JobScheduler Windows Service on Server A.
        • Start the Backup JobScheduler Windows Service on Server B.

      Maintainer Notes

      • This feature works as designed: the Backup JobScheduler is intended for a fail-over situation.
        • Terminating the Primary JobScheduler gracefully does not represent a fail-over situation but a deliberate shutdown.
        • Killing the Primary JobScheduler or otherwise ending the programm, e.g. by a crash, results in a fail-over situation in which the Backup JobScheduler will immediately take action.
      • We are reluctant to change this behavior as
        • it introduces more complexity to the decision if a fail-over situation were given or not.
        • we provide the Active Cluster feature that is intended for a situation like this: gracefully shutdown a cluster member and have other cluster members resume the execution of jobs from the state where they had been left.
      • We offer this feature request for discussion: is it preferable to switch to the Active Cluster architecture for the given scenario or should the Passive Cluster be modified to consider the desired behavior?
      • Please vote for this issue and let us know your feedback.

      Attachments

        Activity

          People

            sos_engine_team TeamEngine
            ap Andreas PĆ¼schel
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 2 days
                2d
                Remaining:
                Remaining Estimate - 2 days
                2d
                Logged:
                Time Spent - Not Specified
                Not Specified