Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-1855

JobScheduler restart automatically in a week and throws an IllegalStateException

    XMLWordPrintable

Details

    • Fix
    • Status: Dismissed (View Workflow)
    • Minor
    • Resolution: Won't Fix
    • 1.12.9
    • 1.12.10
    • None
    • None

    Description

      Current Situation
      The JobScheduler primary master gets restarted periodically in a week and throws the below error in the ./logs/scheduler.log.

      Error message:

      .28 16:16:51.855 com.sos.scheduler.engine.cplusplus.runtime.CppProxy [ERROR] - java.lang.IllegalStateException: Not in C++ thread. This is 'Thread[C++ Heart_beat_watchdog_thread,5,main]', expected is 'Thread[main,5,main]'
      java.lang.IllegalStateException: Not in C++ thread. This is 'Thread[C++ Heart_beat_watchdog_thread,5,main]', expected is 'Thread[main,5,main]'
          at com.sos.scheduler.engine.cplusplus.runtime.CppProxy$.requireCppThread(CppProxy.scala:18)
          at com.sos.scheduler.engine.kernel.cppproxy.Timed_callCImpl.at_millis(Timed_callCImpl.java:18)
          at com.sos.scheduler.engine.kernel.async.CppCall.epochMillis$lzycompute(CppCall.scala:13)
          at com.sos.scheduler.engine.kernel.async.CppCall.epochMillis(CppCall.scala:13)
          at com.sos.scheduler.engine.common.async.StandardCallQueue.add(StandardCallQueue.scala:15)
          at com.sos.scheduler.engine.kernel.async.SchedulerThreadCallQueue.add(SchedulerThreadCallQueue.scala:16)
          at com.sos.scheduler.engine.kernel.Scheduler.enqueueCall(Scheduler.scala:231)
      28 16:16:51.892     1 116766.55401700 [xc.insert 1, "java.lang.IllegalStateException: Not in C++ thread. This is 'Thread[C++ Heart_beat_watchdog_thread,5,main]', expected is 'Thread[main,5,main]'"]
      28 16:16:51.892     0 116766.55401700 [xc.insert 2, "CallVoidMethodA"]
      28 16:16:52.056   164 116766.55401700 {scheduler} waitpid(136930)  JobScheduler restart
      28 16:16:52.056     0 116766.55401700 {scheduler} waitpid(136930)  OK
      

      Desired Behavior
      The JobScheduler should not restart automatically.

      Maintainer Note

      • Here, a passive cluster was configured so that is by design that the JobScheduler restarts if it cannot read/write heartbeats.
      • The reason here was a job that wrote to StdOut without a break. When the task log of this job grew to about 2.8GB it blocked reading and writing heartbeats.

      Attachments

        Activity

          People

            jz Joacim Zschimmer
            Aditi-Dubey Aditi Dubey (Inactive)
            Aditi Dubey Aditi Dubey (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: