Scenario: You need to perform a very heavy scripted database operation, on a very large number of times.
To do this all at once, would: (1) slow down your instance, and (2) take longer than the maximum allowable transaction time/sql count/etc. and (3) other bad stuff.
Each operation takes between 10 and 60 seconds because of the additional business logic that needs to run.
You can't optimize the operations, they're simply slow, and there's a lot of them.
Example: You need to reclassify 100,000 CIs from one class to another (a very heavy operation on a large number of records).
How do you handle this?
I’ve run into this scenario a lot, and in every team I’ve been on (which is a fair number), the go-to answer, is to run a scheduled script which does the operation on a batch (some specific number) of records at a certain interval.
But imagine if you have the job run every 10 minutes, and it deletes 20 records per run.
You can imagine a scenario where the instance or scheduled job is particularly slow due to uncontrollable circumstances (such as the volume of integrations hitting the instance at that particular moment, or a discovery run happening simultaneously with the scheduled script). In this case, the job could take some amount of time longer than the average interval between jobs, in which case the jobs would begin to "pile up" and result in a sort of traffic jam that would both be inefficient, and be a massive drain on instance performance.
So the way to do this safely, would be to figure out what is the longest amount of time you could imagine one "batch" taking, and then setting the scheduled job interval to a little bit more than that amount of time multiplied by the number of records per batch. For example, if you do a 20-record batch and each record takes takes between 10 and 60 seconds, you might want to run one batch every 12 or so minutes.
However, if you need to update 100k records at a rate of 1 every 1.2 minutes, that is 120,000 minutes, or 83 days (24 hours per day).
So, for an operation like this, how can we make it the most efficient that it can be, without bogging down our instance?
The solution: Event-driven recursion.
Using a scheduled job with a gap between each job prevents the “pile-up” effect, which is good, and it gives anything else which might be piling up in the queue (such as a discovery run) to have some time to run. But, since each job might take a different amount of time, we have to be overly conservative with our intervals.
However, with event-driven recursion, we can set it up so that each batch triggers the next! This would prevent the pile-up, and send the next batch to the “back of the line” in terms of the scheduling queue. The result would be that the batches would run slower when the instance is busy, but much quicker when the queue is empty, preventing the pile-up effect, and preventing bogging down the instance, both without sacrificing much in terms of the performance of our actual operations (which we’d like to get done before Jules Verne’s round-the-world journey finishes).
Continued in the full article below…
Read more