Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-1457

Classic Agent may deadlock when starting multiple shell tasks at once

    XMLWordPrintable

Details

    • Fix
    • Status: Released (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.9.2
    • 1.9.4, 1.10
    • JS Classic Agent
    • None
    • Master Windows x64 1.9.2 / Agent Linux x64 1.9.2

    Description

      Work around

      • limit the number of simultaneous job starts to 4-5.
      • start the jobs / orders with a time difference of 30 seconds e.g. Job1 to 5 are at 16:00:00 job6 - 10 at 16:00:30 etc.

      As of now situation

      • This error is sporadic and some time occurred within few minutes or sometime take bit longer.
      • If more then 10 jobs are started simultaneously on the remote agent using process class( all the jobs are using same process_class, after some time few of the jobs will start hanging
      • After some time its depend on number of simultaneous jobs using the process e.g. more the number of jobs problem will occur earlier, JobScheduler Master will also has an error [ERROR] Z-REMOTE-118 [0] [60] [zschimmer::com::object_server::Connection::Connect_operation::async_check_error_]
        2015-07-16 17:25:10.410+0200 [info]   SCHEDULER-918  state=starting (at=2015-07-16 17:25:09.873+0200)
        2015-07-16 17:26:10.001+0200 [ERROR]  Z-REMOTE-118   [0] [60] [zschimmer::com::object_server::Connection::Connect_operation::async_check_error_]
        2015-07-16 17:26:10.047+0200 [ERROR]  SCHEDULER-280  Process terminated with exit code 1 (0x1)
        
      • Also one can see the at the Agent server there are multiple instance of the Agents are started ( owned by the Parent process which is the the original process of the agent e.g. 37607), Please see the error message in scheduler.log agent
        
        [jenkins@gollum logs]$
        [jenkins@gollum logs]$
        [jenkins@gollum logs]$ ../bin/jobscheduler_agent.sh start
        Starting JobScheduler...
        
        [jenkins@gollum logs]$ ../bin/jobscheduler_agent.sh startJobSchedulerLog4JAppender (system.out):  INFO [JobScheduler-Engine-ak                                                                ka.actor.default-dispatcher-5] (Slf4jLogger.scala:80) - Slf4jLogger started
                               ps -ef | grep 4592
        jenkins  37607     1 99 15:09 ?        00:00:07 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/                                                                bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler                                                                _agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/fac                                                                tory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/ho                                                                me/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent                                                                /jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -ro                                                                les=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  37645 18399  0 15:09 pts/0    00:00:00 grep 4592
        [jenkins@gollum logs]$ more scheduler.pid
        37607
        [jenkins@gollum logs]$ ps -ef | grep 4592
        jenkins  37607     1 12 15:09 ?        00:00:11 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/                                                                bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler                                                                _agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/factory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent                                                                /jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -ro                                                                les=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  37886 18399  0 15:11 pts/0    00:00:00 grep 4592
        [jenkins@gollum logs]$ ps -ef | grep 4592
        jenkins  37607     1 19 15:09 ?        00:04:03 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/factory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -roles=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  40180 37607  0 15:22 ?        00:00:00 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/factory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -roles=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  40723 37607  0 15:24 ?        00:00:00 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/factory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -roles=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  41534 37607  0 15:29 ?        00:00:00 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/factory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -roles=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  41831 37607  0 15:30 ?        00:00:00 /home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/bin/scheduler -service -id=jobscheduler_agent_1.9.2_4592 -sos.ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/sos.ini -ini=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/factory.ini -config=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/config/scheduler.xml -param=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -cd=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -include-path=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592 -roles=agent -pid-file=/home/jenkins/sos-berlin.com/jobscheduler_agent/jobscheduler_agent_1.9.2_4592/logs/scheduler.pid
        jenkins  41855 18399  0 15:30 pts/0    00:00:00 grep 4592
        [jenkins@gollum logs]$
        
      • One after another Jobs will hand and eventually agfter some time Agent will crash with the error
        • Screenshot
        • Error message
          .16 12:09:40.886     0 21196.B4DFA700 open("/tmp/jenkins/sos.TDoNNc")  => 201
          .16 12:09:40.886     0 21196.B4DFA700 close(201) /tmp/jenkins/sos.TDoNNc
          .16 12:09:40.886     0 21196.B4DFA700 pthread_cond_signal sos::scheduler::File_logger::File_logger_thread::File_logger_thread
          "sos::scheduler::File_logger::File_logger_thread::terminate"
          .16 12:09:40.886     0 21196.B57FB700 open("/tmp/jenkins/sos.PXOEaA")  [ERROR ERRNO-1  Operation not permitted [pthread_mutex_
          unlock] []]
          .16 12:09:40.887     1 21196.B4DFA700 [ERROR ERRNO-1  Operation not permitted [pthread_mutex_unlock] []]
          .16 12:09:40.887     0 21196.B57FB700 => 201
          .16 12:09:40.887     0 21196.B57FB700 close(201) /tmp/jenkins/sos.PXOEaA
          .16 12:09:40.887     0 21196.B57FB700 open("/tmp/jenkins/sos.TDoNNc")  => 201
          .16 12:09:40.887     0 21196.B57FB700 close(201) /tmp/jenkins/sos.TDoNNc
          .16 12:09:40.887     0 21196.B57FB700 open("/tmp/jenkins/sos.PXOEaA")  => 201
          .16 12:09:40.887     0 21196.B57FB700 close(201) /tmp/jenkins/sos.PXOEaA
          .16 12:09:40.887     0 21196.B57FB700 open("/tmp/jenkins/sos.TDoNNc")  => 201
          .16 12:09:40.887     0 21196.B57FB700 close(201) /tmp/jenkins/sos.TDoNNc
          .16 12:09:40.887     0 21196.B57FB700 Thread File_logger_thread terminated
          .16 12:09:40.887     0 21196.B75FE700 open("/tmp/jenkins/sos.Krq7QL")  => 201
          .16 12:09:40.887     0 21196.B75FE700 close(201) /tmp/jenkins/sos.Krq7QL
          [jenkins@gollum logs]$
          

      How to reproduce

      • Download 18.zip into live folder, change the remote agent settings in the process class

      Logs

      Attachments

        1. jobscheduler-agent-test.png
          176 kB
          Mahendra Patidar
        2. js-1457-job-hangs.png
          47 kB
          Mahendra Patidar

        Issue Links

          Activity

            People

              jz Joacim Zschimmer
              mp Mahendra Patidar
              Mahendra Patidar Mahendra Patidar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: