SQL Server Statistics and smart way to update it
Anybody who has worked with SQL
Server databases whether be it administrator or developer, knows the
significance of having “statistics”
on a table and how it greatly impacts SQL query performance. Below are some
excerpts about “statistics” from SQL Server documentation.
“Statistics for query optimization are binary
large objects (BLOBs) that contain statistical information about the
distribution of values in one or more columns of a table or indexed view. The
Query Optimizer uses these statistics to estimate the cardinality, or number of
rows, in the query result. These cardinality
estimates enable the Query Optimizer to create a high-quality
query plan.”
“Statistics” is a complex amalgamation of histograms and densities
that helps SQL Server Query optimizer
correctly estimate number of rows it may fetch and process for each operator in
order to generate an optimal or near optimal execution plan. Estimation plays huge role in how much time each
operator will take in real time and wrong estimations means SQL Server has to
do more amount of work to get actual number of rows leading to unexpected long
runs for same query in different executions. Please note that although
statistics is an important factor but it is not
the only factor that can affect query run times.
Problem
The main problem in updating
statistics is that there are only two options available to update stats. One is
by using “UPDATE STATISTICS”
statement and other is by using “sp_updatestats”
system stored procedure. Now “sp_updatestats”
SP simply run “UPDATE STATISTICS” statement
with “ALL” option, on all tables
where even single row is modified. It also doesn’t give much options other than
choosing “RESAMPLE” option for
sampling of rows or values. “UPDATE
STATISTICS” statement is the one, which provides us all the possible options
but like any other statement it is static in nature. We can neither be much
selective in what to choose and what not to nor are options available to
evaluate number of rows or percentage of rows modified to exclude tables with
low modification rate.
Solution
Thanks to programmatic nature of
T-SQL we can make the static “UPDATE
STATISTICS” statement dynamic. Below is my take on making the update
statistics a smarter process, which gives you much more selectivity options as
well freedom to choose dynamic sampling as per size of table. Below script
gives you flexibility to choose statistics to update on the basis of:
- Whether it is related to user table or internal table or both.
- Whether you want to update index based statistics only or internal as well.
- Minimum number of hours the statistics should have been outdated.
- Minimum percentage of rows that would have been modified for a table.
- Dynamic sampling rate as per size of table as well liberty of using fixed sample percent.
set nocount on
declare @tbln varchar(255), @schn varchar(255), @stsn varchar(255), @nrws bigint, @smp_rt tinyint;
declare @cmd nvarchar(max), @idx_st_only bit, @inc_sys_tbl bit, @hrs int, @is_modfyd int, @mod_perc int;
--declare @nrch bigint;
set @idx_st_only = 0; --update stats for index only
stats
set @inc_sys_tbl = 1; --include SQL server internal
table
set @hrs = -12; --no of hours check. all
stats which are stale for more than these many hours will be updated. it will
be in negative.
set @smp_rt = null; --set statistics sample rate
here in case want to go with fixed sample rate for all tables
set @is_modfyd = 1; --whether update stats only
for tables which got modified
if (@idx_st_only = 0 and @inc_sys_tbl = 1)
begin
declare cur_update_stats cursor read_only for
SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
[s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
--,[ds].[modification_counter]
FROM [sys].[stats] AS [s]
JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
and [ds].[modification_counter] > @is_modfyd
ORDER BY [ds].[rows]
print '-----it will update all stats
of all objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'
end
if (@idx_st_only = 1 and @inc_sys_tbl = 0)
begin
declare cur_update_stats cursor read_only for
SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
[s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
--,[ds].[modification_counter]
FROM [sys].[stats] AS [s]
JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
and [s].[auto_created] = 0
and [o].[type] = 'U'
and [ds].[modification_counter] > @is_modfyd
ORDER BY [ds].[rows]
print '-----it will update index
based stats for all user objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'
end
if (@idx_st_only = 0 and @inc_sys_tbl = 0)
begin
declare cur_update_stats cursor read_only for
SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
[s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
--,[ds].[modification_counter]
FROM [sys].[stats] AS [s]
JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
and [o].[type] = 'U'
and [ds].[modification_counter] > @is_modfyd
ORDER BY [ds].[rows]
print '-----it will update all stats
of user objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'
end
if (@idx_st_only = 1 and @inc_sys_tbl = 1)
begin
declare cur_update_stats cursor read_only for
SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
[s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
--,[ds].[modification_counter]
FROM [sys].[stats] AS [s]
JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
and [s].[auto_created] = 0
and [ds].[modification_counter] > @is_modfyd
ORDER BY [ds].[rows]
print '-----it will update all index
based stats of all objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'
end
open cur_update_stats;
fetch next from cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc;
--fetch next from cur_update_stats into
@tbln, @schn, @stsn, @nrws, @mod_perc, @nrch; -- ##use this fetch statement if
you want to print stats details as well
while @@FETCH_STATUS = 0
begin
--print '--table : ' + @schn
+ '.' + @tbln + ' ; no of rows : ' + convert(varchar(max),@nrws)
-- + ' ; no of rows changed : ' +
convert(varchar(max),@nrch);
If @mod_perc > 9.9 --Minimum
number of rows percentage modified to be updated
begin
If @smp_rt is null
begin
select @smp_rt = case
when @nrws < 1000 then 100
when @nrws between 1000 and 9999 then 90
when @nrws between 10000 and 99999 then 80
when @nrws between 100000 and 499999 then 70
when @nrws between 500000 and 999999 then 60
when @nrws between 1000000 and 4999999 then 50 --million rows and above
logics starts here
when @nrws between 5000000 and 9999999 then 45
when @nrws between 10000000 and 49999999 then 40
when @nrws between 50000000 and 99999999 then 35
else 30 end ; --default sample percentage
for all tables having 100 million+ records
end
set @cmd = N'Update statistics [' + @schn + '].[' + @tbln + '] ([' + @stsn + ']) with sample '
+ convert (nvarchar,@smp_rt) + ' percent';
print @cmd; --print update stats command
before execution
exec sp_executesql @cmd;
end
fetch next from cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc;
--fetch next from
cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc, @nrch;
end
close cur_update_stats;
deallocate cur_update_stats;
--end of script
Hope you like this new and smart way of updating statistics. This can easily be put into a stored procedure with all the options as parameters and can be used to schedule a job as well.
Comments
Post a Comment