Uploaded image for project: 'JS - JobScheduler'
  1. JS - JobScheduler
  2. JS-1418

Big Task log should not hang or crash JobScheduler in case of mail_on_error=yes

    XMLWordPrintable

Details

    • Fix
    • Status: Dismissed (View Workflow)
    • Minor
    • Resolution: Won't Fix
    • 1.7.5, 1.8.2, 1.9.2
    • 1.10
    • None
    • None
    • Linux x64 / JobScheduler 1.9.2

    Description

      As of today behavior

      • When a task produce a huge stdout ( 50MB+) output and JobScheduler is set
        mail_on_error=yes
      • If the task ends with an error, the JobScheduler try to send an alert email with attachment of the task log ( including stdout from task )
      • If the JobScheduler is set as a Active/Passive cluster, Primary JobScheduler crashes
      • If the JobScheduler is installed as a single instance JobScheduler, JOC and JobScheduler engine hangs and become unresponsive with following message
      • .23 12:19:07.428     3 49730.E14F3700 open("/tmp/jenkins/sos.Js3HPu")  => 231
        .23 12:19:07.428     0 49730.E14F3700 close(231) /tmp/jenkins/sos.Js3HPu
        .23 12:19:07.524    96 49730.E14F3700 unlink("/tmp/jenkins/sos.ohSKjH")
        .23 12:19:07.524     0 49730.E14F3700 unlink("/tmp/jenkins/sos.mk1Oyi")
        .23 12:19:07.524     0 49730.E14F3700 unlink("/tmp/jenkins/sos.Js3HPu")
        .23 12:19:07.524     0 49730.E14F3700 unlink("/tmp/jenkins/sos.22RH45")
        .23 12:19:07.525     1 49730.E14F3700 {scheduler} sos::scheduler::database::Transaction::execute  UPDATE SCHEDULER_JOBS  set "STOPPED"=1  where "SPOOLER_ID"='jobscheduler.1.9.2_4492' and "CLUSTER_MEMBER_ID"='-' and "PATH"='Ticket#2015061110000011/job1'  (sos::scheduler::Standard_job::database_record_store)
        .23 12:19:07.525     0 49730.E14F3700 {scheduler} sos::scheduler::database::Transaction::execute  COMMIT  (sos::scheduler::Standard_job::database_record_store)
        .23 12:19:07.637   112 49730.E14F3700 JavaMail Send smtp=mail.sos-berlin.com to="JobSchedulerr@sos-berlin.com" subject="[error] Task job1 terminated with errors"
        

      How to reproduce

      Desired behavior

      • The JobScheduler should not crash or hang in case of a task ends with an error and has a huge task log.

      Maintainer Notes

      Though this incident might not be related to sending mail but to problems when storing logs in the database, it still shows an interesting behavior: if a log file cannot be sent by e-mail then it will be stored in a local folder.
      This makes little sense when dealing with unrecoverable errors.
      Currently JobScheduler cannot accurately decide if a mail error is recoverable (e.g. mailbox full) or unrecoverable.
      This is intended as input for the future e-mail management with JS-1375 that should implement this distinction.

      Attachments

        Issue Links

          Activity

            People

              mp Mahendra Patidar
              mp Mahendra Patidar
              Mahendra Patidar Mahendra Patidar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: