Home, Optimizer, Benchmarks, Server Systems, System Architecture, Processors, Storage,

There are situations where guaranteed correctness is essential, which is why we have transactional databases meeting the ACID properties. And then there are situations where we just need an approximate number. An example is the size of the SQL Agent job history log.

I bring up this topic because on more than one occasion, after fixing performance problems at a client site, the remaining top waits are two SQL statements belonging to stored procedures in msdb: one in sp_sqlagent_log_jobhistory,

sql agent ins

and one in sp_jobhistory_row_limiter.

jobhistory row limiter

The average wait time is on for the day of Feb 28 was 207ms and 201ms respectively. The average elapsed times over several weeks are 171ms and 170ms respectively. Neither of these show meaningful CPU consumption, at 0.716ms and 0.124ms respectively.

A third statement, in sp_jobhistory_row_limiter, actually has much higher CPU consumption. This statement contributes 33.5ms CPU and 39.4ms elapsed time per call over the long period.

jobhistory row limiter

In terms of overall load on the system, these msdb procedures are negligible. A modern 2-socket system has 12-16 physical cores, so there are 288-384 CPU-hours available per day. The CPU intensive query consumes 40 min per day (33.5ms/call x 71,509 calls per day), a very small portion of the available CPU.

The wait time of the 2 longer running queries may be significant at 4 hours per day each. The calls are from SQL Agent, not a live person waiting for the response, so the wait time is not a problem by itself. The high wait statement in sp_sqlagent_log_jobhistory is:

INSERT INTO msdb.dbo.sysjobhistory ( job_id, step_id, step_name, sql_message_id, sql_severity, message, run_status, run_date, run_time, run_duration, operator_id_emailed, operator_id_netsent, operator_id_paged, retries_attempted, server)
VALUES ( @job_id, @step_id, @step_name, @sql_message_id, @sql_severity, @message, @run_status, @run_date, @run_time, @run_duration, @operator_id_emailed, @operator_id_netsent, @operator_id_paged, @retries_attempted, @server)
GO

The high wait statement in sp_jobhistory_row_limiter is:

SELECT @current_rows_per_job = COUNT(*)
FROM msdb.dbo.sysjobhistory with (TABLOCKX)
WHERE (job_id = @job_id)

The maximum elapsed times of sp_sqlagent_log_jobhistory and sp_jobhistory_row_limiter over the longer period are 11 and 41 seconds respectively. Potentially this might cause problems in situation that require precise timing. Of course, it is probably not a good idea to rely on SQL Agent if precise timing is required on a heavily load database server processing a high volume of jobs.

One might think that at just less than 1 call per second (71,509 calls per day, 86,400 sec per day) the chance of a collision between the two statements at 0.2 sec each is small. However these are not random calls, but fire on schedule. So if 100 jobs fire off every 2 minutes on the minute, perhaps the chance of collision is much higher?

The sp_sqlagent_log_jobhistory procedure inserts into jobhistory, and then at the end calls sp_jobhistory_row_limiter. There will not be contention between the two procedures for a given job and session. So the issue is whether there will be collisions between different jobs. (I am not sure under what circumstances there would be multiple sessions of 1 job. Does this occur if the job does not complete within its periodicity or can one job be controlled by multiple schedulers?)

sp_jobhistory_row_limiter

The simplified sequence of statements in sp_jobhistory_row_limiter excluding conditionals is as follows:

EXECUTE master.dbo.xp_instance_regread -- 'JobHistoryMaxRows'

EXECUTE master.dbo.xp_instance_regread -- 'JobHistoryMaxRowsPerJob'

BEGIN TRANSACTION

/* this is the high wait statement */
SELECT @current_rows_per_job = COUNT(*)
FROM msdb.dbo.sysjobhistory with (TABLOCKX)
WHERE (job_id = @job_id)

WITH RowsToDelete AS (
  SELECT TOP (@rows_to_delete) *
  FROM msdb.dbo.sysjobhistory
  WHERE (job_id = @job_id)
  ORDER BY instance_id
)
DELETE FROM RowsToDelete;

/* this is the high CPU statement */
SELECT @current_rows = COUNT(*)
FROM msdb.dbo.sysjobhistory

/* this is rarely called? */
WITH RowsToDelete AS (
  SELECT TOP (@rows_to_delete) *
 FROM msdb.dbo.sysjobhistory
 ORDER BY instance_id
)
DELETE FROM RowsToDelete;

IF (@@trancount > 0)
COMMIT TRANSACTION

The combination of the SELECT with (TABLOCKX) and the BEGIN/COMMIT TRAN will block inserts to jobhistory for the duration of the transaction. So it is quite possible there are collisions with the sp_sqlagent_log_jobhistory insert statement.

Below is the detail for the xp_instance_regread call. For some reason the execution statistics are not available. If this is mainly called from the sp_jobhistory_row_limiter procedure, with 2 calls, then there would be 143,000 calls per day with an average wait of 12.6ms per call.

jobhistory row limiter

I am thinking that the reason for the BEGIN/COMMIT TRAN block in the sp_jobhistory_row_limiter procedure is to prevent the action of one session from deleting rows in the process of being inserted by another session?

Having limits on both the total number of rows for job history and individual jobs is a good idea. But this is an example of a situation where we are not concern about guaranteed accuracy. In this case there are 150 jobs and 260,000 rows in jobhistory, requiring 460MB of storage. If we set total jobhistory at 500,000 rows and individual job history at 1000 rows, perhaps our intention

We definitely do not need to check and enforce (both) jobhistory limits 71,500 times per day. One way to reduce the frequency of calls is to delete. I suggest deleting at minimum 10 rows per job everytime the threshold is exceeded.

I think it is silly to do a full count of the number of rows in sysjobhistory for job execution. The SQL Server SELECT COUNT(*) statement does not cheat. It reports the actual number of rows in the table by touching the row header of each and every row. So perhaps we should use an alternative good enough row count, from the sys.dm_db_partition_stats row_count column for example?

I would also like to do away with the BEGIN/COMMIT TRAN block along with the TABLOCKX. Perhaps we can only delete jobs more than 1 day old ?

/* be sure to make this index change before altering the procedure */
CREATE INDEX nc1 ON sysjobhistory (job_id) INCLUDE (run_date)
WITH (DROP_EXISTING =ON)
GO

First Alternative jobhistory_row_limiter

So my proposal for the sequence of statements in jobhistory_row_limiter is as follows:

SELECT @today = CONVERT(int, CONVERT(char(8), GETDATE(),112))

--BEGIN TRAN
SELECT @current_rows_per_job = COUNT(*)
FROM msdb.dbo.sysjobhistory -- with (TABLOCKX)
WHERE (job_id = @job_id) AND run_date < @today

SELECT @rows_to_delete = @current_rows_per_job - @max_rows_per_job

IF (@rows_to_delete > 0) BEGIN
/* if we only delete previous days, then this will only active once per day */
 WITH RowsToDelete AS (
  SELECT TOP (@rows_to_delete) *
  FROM msdb.dbo.sysjobhistory
  WHERE (job_id = @job_id) AND run_date < @today
  ORDER BY instance_id
 )
 DELETE FROM RowsToDelete;
END

/* do this first */
SELECT @current_rows = row_count
FROM sys.dm_db_partition_stats d WITH(NOLOCK)
WHERE object_id = OBJECT_ID('sysjobhistory') AND index_id < 2

/* only if the above is above threshold */
IF (@current_rows > @max_total_rows) BEGIN

SELECT @current_rows = COUNT(*)
FROM msdb.dbo.sysjobhistory with (NOLOCK)
WHERE run_date < @today

SELECT @rows_to_delete = @current_rows - @max_total_rows

IF (@rows_to_delete > 0) BEGIN
 WITH RowsToDelete AS (
  SELECT TOP (@rows_to_delete) *
  FROM msdb.dbo.sysjobhistory
  WHERE run_date < @today
  ORDER BY instance_id
 )
 DELETE FROM RowsToDelete;
-- ALTER INDEX ALL ON msdb.dbo.sysjobhistory REBUILD?
END

END

--COMMIT TRAN

Even though the overall impact of the jobhistory limits procedures are insignificant, it is somewhat annoying to have them show up at the top of the list of a third party performance monitoring tool. With a few simple precautions, these can be eliminated.

ps
If any one is willing to alter the jobhistory row limiter procedure,
please us know the changes made along with the before and after execution statistics.

I should also add that the sysjobhistory table will get heavily fragmented over time, especially since job_id is a uniqueidentifier, so perhaps periodic index rebuilds are in order. Perhaps our strategy should be: when the max total rows is exceeded, delete a larger percentage, say 20% for example, then rebuild the index (outside the transaction if present),

Daily Delete

The latest thought is for the sp_sqlagent_log_jobhistory, procedure that inserts into sysjobhistory to not call sp_jobhistory_row_limiter for step = 0 at the end.

Instead we should have a new procedure that runs once per day to delete excess rows from sysjobhistory as follows. The registry reads for max rows per job and max total rows remain the same.

DECLARE @job_id UNIQUEIDENTIFIER, @job_cnt INT, @today INT
CREATE TABLE #jobc( job_id UNIQUEIDENTIFIER, job_cnt INT, rows_to_delete INT,
  primary key clustered (job_id) )

SELECT @today = CONVERT(int, CONVERT(char(8), GETDATE(),112))

INSERT #jobc
SELECT job_id, job_cnt = COUNT(*), rows_to_delete = COUNT(*) - @max_rows_per_job
FROM msdb.dbo.sysjobhistory
WHERE run_date < @today
GROUP BY job_id
HAVING COUNT(*)> @max_rows_per_job

/* Either implement a cursor or delete all excess rows per job in a single step */
/* cursor method */
DECLARE jobc CURSOR FOR
  SELECT job_id, rows_to_delete
  FROM #jobc

OPEN jobc
FETCH NEXT FROM jobc INTO @job_id, @rows_to_delete

WHILE ( @@FETCH_STATUS = 0) BEGIN
 WITH RowsToDelete AS (
  SELECT TOP (@rows_to_delete) *
  FROM msdb.dbo.sysjobhistory
  WHERE (job_id = @job_id)
  ORDER BY instance_id
 )
 DELETE FROM RowsToDelete;
 FETCH NEXT FROM jobc INTO @job_id, @rows_to_delete
END

CLOSE jobc
DEALLOCATE jobc

-- alternative single statement delete
DELETE msdb.dbo.sysjobhistory
WHERE instance_id IN (
 SELECT x.instance_id
 FROM (
  SELECT h.job_id, h.instance_id, c.rows_to_delete,
   RANK() OVER ( PARTITION BY h.job_id ORDER BY h.instance_id ) rk
   FROM msdb.dbo.sysjobhistory h
   INNER JOIN #jobc c ON c.job_id = h.job_id
  ) x
 WHERE x.rk <= rows_to_delete
)

-- Delete the oldest history row(s) if inserting the new row has pushed us over the
-- global MaxJobHistoryTableRows limit.
SELECT @current_rows = COUNT(*)
FROM msdb.dbo.sysjobhistory with (NOLOCK)
WHERE run_date < @today

SELECT @rows_to_delete = @current_rows - @max_total_rows

IF (@rows_to_delete > 0) BEGIN
 WITH RowsToDelete AS (
  SELECT TOP (@rows_to_delete) *
  FROM msdb.dbo.sysjobhistory
  ORDER BY instance_id
 )
 DELETE FROM RowsToDelete;
END

ALTER INDEX ALL ON msdb.dbo.sysjobhistory REBUILD;
-- ALTER INDEX ALL ON msdb.dbo.sysjobhistory REBUILD?