Details
-
Feature
-
Status: Released (View Workflow)
-
Minor
-
Resolution: Fixed
-
None
Description
Current Situation
- We assume we have a JobScheduler cluster configuration.
- If the database server is very slow, e.g. takes more than 1 minute to respond, then a warning is raised that a thread lock is blocked for more than 15s:
com.sos.scheduler.engine.cplusplus.runtime.ThreadLock [WARN ] - Waiting for Scheduler ThreadLock, currently acquired by Thread[main,5,main], current stack trace: com.sos.scheduler.engine.cplusplus.runtime.ThreadLock$LoggingLock$1 at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at oracle.net.ns.Packet.receive(Packet.java:311) at oracle.net.ns.DataPacket.receive(DataPacket.java:105) at oracle.net.ano.CryptoDataPacket.receive(Unknown Source) at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305) at oracle.net.ns.NetInputStream.read(NetInputStream.java:249) at oracle.net.ns.NetInputStream.read(NetInputStream.java:171) at oracle.net.ns.NetInputStream.read(NetInputStream.java:89) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123) at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79) at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:397) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:30) at oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:931) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1150) at oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1707) at oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:1670) at oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:310)
and then the heartbeat-watchdog-thread which should abort the JobScheduler if the last heartbeat is too old doesn't work.
Desired Behavior
- The heartbeat-watchdog-thread should abort the JobScheduler if it is necessary even if a thread lock occurred.
- To activate this behavior the default setting for automated restart has to be disabled by use of the following setting with ./config/scheduler.xml (see
JS-1035):<params> <param name="scheduler.cluster.restart_after_emergency_abort" value="false"/> </params>
- If looping through pending objects takes more than 2 minutes in a cluster configuration then JobScheduler will perfom the following actions:
- Kill all child processes
- Remove its PID file
- Abort operation and terminate the JobScheduler instance
- A backup instance will become active (if configured)