Details
-
Feature
-
Status: Released (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
Current Situation
- JobScheduler has no knowledge if an Agent is available as long as no job is launched for execution on that Agent.
Desired Behavior
- Users would like to know if any Agent in the network were not available in order to take measures before the execution of a job.
Implementation
- The Perl script check_jobscheduler_agent.pl can be used for integration with Nagios, op5 and compatible System Monitors.
- The script is used with the following command arguments:
- host and port of the JobScheduler Universal Agent with the URL
http://host:port/jobscheduler/agent/api/overview
- list of attributes that are used to perform the check
- list of attributes that are added to the output of the script
- max. timeout to connect to the Agent
- host and port of the JobScheduler Universal Agent with the URL
Operation
- Check
- The script makes use of the attributes totalTaskCount and currentTaskCount from an Agent response to check if the Agent is available.
- Output
- If the Agent is not available:
- Message:
Check JobScheduler Universal Agent - Agent is CRITICAL - Connection failed: 500 Can't connect to 192.11.0.38:4445 (connect: timeout)
- The timeout is configurable, see below "Service Parameters"
- Message:
- If the Agent is available:
- Message:
Check JobScheduler Universal Agent - Agent is OK - startedAt: 2015-07-17T12:05:52.245Z, totalTaskCount: 170422, currentTaskCount: 52, isTerminating: 0
- Message:
- If the Agent is not available:
- Service Command
A Service Command has to be declared before configuring the Nagios/op5 Service that makes use of this Command. The following declaration for the Command is recommended:define command{ command_name check_jobscheduler_agent command_line /opt/plugins/check_jobscheduler_agent.pl -u $ARG1$ -a $ARG2$ -o $ARG3$ -t $ARG4$ }
- Service Parameters
When configuring the Nagios/op5 Service then parameters have to be specified, e.g.:http://galadriel.sos:4455/jobscheduler/agent/api/overview!'{totalTaskCount},{currentTaskCount}'!'{startedAt},{totalTaskCount},{currentTaskCount},{isTerminating}'!20
where
- first argument is the URL for the HTTP connection
- second argument is the list of attributes that are used to check the Agent availability
- third argument is the list of attributes that are used for output of the script and that will be displayed in the System Monitor.
- fourth argument is the timeout for the connection to the Agent.
Maintainer Notes
- This task is preferably performed by a System Monitor such as Nagios, HP OpenView, SCOM etc. as such monitors provide a better overview of network related problems and escalation rules that JobScheduler cannot be aware of.
- The JobScheduler Universal Agent accept and respond to a HTTP web service request that can be used to check the Agent status. This check can be performed either by a System Monitor or by a JobScheduler job on a cyclic basis.
Attachments
Issue Links
- relates to
-
JS-1589 Agent applies token based authentication for REST web service interface
- Dismissed
-
JS-1291 JobScheduler Universal Agent
- Released
-
JS-1426 Status command for Universal Agent start script shows status information from overview web service call
- Released
- requires
-
JS-1410 JobScheduler Universal Agent web services return information about current state
- Released
-
JS-1480 JobScheduler Universal Agent web services
- Released
(2 mentioned in)