Details
-
Fix
-
Status: Dismissed (View Workflow)
-
Minor
-
Resolution: Won't Fix
-
1.7.5, 1.8.2, 1.9.2
-
None
-
None
-
Linux x64 / JobScheduler 1.9.2
Description
As of today behavior
- When a task produce a huge stdout ( 50MB+) output and JobScheduler is set
mail_on_error=yes
- If the task ends with an error, the JobScheduler try to send an alert email with attachment of the task log ( including stdout from task )
- If the JobScheduler is set as a Active/Passive cluster, Primary JobScheduler crashes
- If the JobScheduler is installed as a single instance JobScheduler, JOC and JobScheduler engine hangs and become unresponsive with following message
.23 12:19:07.428 3 49730.E14F3700 open("/tmp/jenkins/sos.Js3HPu") => 231 .23 12:19:07.428 0 49730.E14F3700 close(231) /tmp/jenkins/sos.Js3HPu .23 12:19:07.524 96 49730.E14F3700 unlink("/tmp/jenkins/sos.ohSKjH") .23 12:19:07.524 0 49730.E14F3700 unlink("/tmp/jenkins/sos.mk1Oyi") .23 12:19:07.524 0 49730.E14F3700 unlink("/tmp/jenkins/sos.Js3HPu") .23 12:19:07.524 0 49730.E14F3700 unlink("/tmp/jenkins/sos.22RH45") .23 12:19:07.525 1 49730.E14F3700 {scheduler} sos::scheduler::database::Transaction::execute UPDATE SCHEDULER_JOBS set "STOPPED"=1 where "SPOOLER_ID"='jobscheduler.1.9.2_4492' and "CLUSTER_MEMBER_ID"='-' and "PATH"='Ticket#2015061110000011/job1' (sos::scheduler::Standard_job::database_record_store) .23 12:19:07.525 0 49730.E14F3700 {scheduler} sos::scheduler::database::Transaction::execute COMMIT (sos::scheduler::Standard_job::database_record_store) .23 12:19:07.637 112 49730.E14F3700 JavaMail Send smtp=mail.sos-berlin.com to="JobSchedulerr@sos-berlin.com" subject="[error] Task job1 terminated with errors"
How to reproduce
- deploy attached Job Ticket#2015061110000011.zip in the JobScheduler's live folder
- start the Job
Desired behavior
- The JobScheduler should not crash or hang in case of a task ends with an error and has a huge task log.
Maintainer Notes
Though this incident might not be related to sending mail but to problems when storing logs in the database, it still shows an interesting behavior: if a log file cannot be sent by e-mail then it will be stored in a local folder.
This makes little sense when dealing with unrecoverable errors.
Currently JobScheduler cannot accurately decide if a mail error is recoverable (e.g. mailbox full) or unrecoverable.
This is intended as input for the future e-mail management with JS-1375 that should implement this distinction.
Attachments
Issue Links
- relates to
-
JS-1375 New e-mail management
- Dismissed