Details
-
Feature
-
Status: Dismissed (View Workflow)
-
Minor
-
Resolution: Won't Do
-
1.12.9
-
None
-
None
Description
Current Situation
- When JobScheduler Passive Cluster is running with PostgreSQL and an Order is renamed,
- The JobScheduler executes delete command on the table SCHEDULER_ORDERS
19 18:39:09.883 scheduler 16 9184.2A60 7834.965MB ..{scheduler} sos::scheduler::database::Transaction::execute DELETE from SCHEDULER_ORDERS where "SPOOLER_ID"='12.9_pg_4467' and "ID"='1' and "JOB_CHAIN"='Ticket#2019061410000016/job_chain1' and "OCCUPYING_CLUSTER_MEMBER_ID" IS NULL and `distributed_next_time` is null (sos::scheduler::order::Order::db_update2) .19 18:39:09.884 scheduler 0 9184.2A60 7834.965MB ..{scheduler} sos::scheduler::database::Transaction::execute COMMIT (sos::scheduler::order::Order::db_update2)
- The Order's information is not deleted from the table SCHEDULER_ORDERS.
How to Reproduce
- Setup a JobScheduler Active-Active Cluster e.g. JobScheduler SCHEDULER_CLUSTER_OPTIONS=-distributed-orders
- Star the JobScheduler cluster and run any job chain
- Shut down the JobScheduler Active-Active cluster and change the cluster type to Passive cluster
- Primary Cluster: SCHEDULER_CLUSTER_OPTIONS=-exclusive
- Backup Cluster: SCHEDULER_CLUSTER_OPTIONS=-exclusive -backup
- Restart the JobScheduler cluster, now the Job chains will automatically start and order's log file will have the message SCHEDULER-853 Order in database could not be updated or deleted.
Desired Behavior
- When JobScheduler is configured in a Passive cluster, and an Order's information is updated e.g. rename or runtime, JobScheduler should delete the Order's information.
Maintainers Note
- JobScheduler is working as expected since the change in Cluster type is a seldom occurrence. Following workaround can be applied in case of change n the cluster type.
- The JobScheduler read the Order's runtime information from database table SCHEDULERS_ORDERS, where the for passive cluster column DISTRIBUTED_NEXT_TIME is always has a null value and for the active cluster the column has the a value indicating next runtime of the order.
Workaround
-
- Change Active Cluster to Passive Cluster
- Comment/remove <distributed="yes"> from job chains.
- Delete all the records from database table SCHEDULER_ORDERS e.g.
DELETE FROM scheduler_orders WHERE "DISTRIBUTED_NEXT_TIME" IS NOT NULL;
- Change Active Cluster to Passive Cluster
-
- Change Passive Cluster to Active Cluster
- Delete all the records from database table SCHEDULER_ORDERS e.g.
DELETE FROM scheduler_orders WHERE "DISTRIBUTED_NEXT_TIME" IS NULL;
- Delete all the records from database table SCHEDULER_ORDERS e.g.
- Change Passive Cluster to Active Cluster