SQL Server Statistics and smart way to update it


Anybody who has worked with SQL Server databases whether be it administrator or developer, knows the significance of having “statistics” on a table and how it greatly impacts SQL query performance. Below are some excerpts about “statistics” from SQL Server documentation.

Statistics for query optimization are binary large objects (BLOBs) that contain statistical information about the distribution of values in one or more columns of a table or indexed view. The Query Optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the Query Optimizer to create a high-quality query plan.

Statistics” is a complex amalgamation of histograms and densities that helps SQL Server Query optimizer correctly estimate number of rows it may fetch and process for each operator in order to generate an optimal or near optimal execution plan. Estimation plays huge role in how much time each operator will take in real time and wrong estimations means SQL Server has to do more amount of work to get actual number of rows leading to unexpected long runs for same query in different executions. Please note that although statistics is an important factor but it is not the only factor that can affect query run times.

Problem

The main problem in updating statistics is that there are only two options available to update stats. One is by using “UPDATE STATISTICS” statement and other is by using “sp_updatestats” system stored procedure. Now “sp_updatestats” SP simply run “UPDATE STATISTICS” statement with “ALL” option, on all tables where even single row is modified. It also doesn’t give much options other than choosing “RESAMPLE” option for sampling of rows or values. “UPDATE STATISTICS” statement is the one, which provides us all the possible options but like any other statement it is static in nature. We can neither be much selective in what to choose and what not to nor are options available to evaluate number of rows or percentage of rows modified to exclude tables with low modification rate.

Solution

Thanks to programmatic nature of T-SQL we can make the static “UPDATE STATISTICS” statement dynamic. Below is my take on making the update statistics a smarter process, which gives you much more selectivity options as well freedom to choose dynamic sampling as per size of table. Below script gives you flexibility to choose statistics to update on the basis of: 
  • Whether it is related to user table or internal table or both.
  • Whether you want to update index based statistics only or internal as well.
  • Minimum number of hours the statistics should have been outdated.
  • Minimum percentage of rows that would have been modified for a table.
  • Dynamic sampling rate as per size of table as well liberty of using fixed sample percent.



set nocount on

declare @tbln varchar(255), @schn varchar(255), @stsn varchar(255), @nrws bigint, @smp_rt tinyint;
declare @cmd nvarchar(max), @idx_st_only bit, @inc_sys_tbl bit, @hrs int, @is_modfyd int, @mod_perc int;
--declare @nrch bigint;

set @idx_st_only = 0; --update stats for index only stats
set @inc_sys_tbl = 1; --include SQL server internal table
set @hrs = -12; --no of hours check. all stats which are stale for more than these many hours will be updated. it will be in negative.
set @smp_rt = null; --set statistics sample rate here in case want to go with fixed sample rate for all tables
set @is_modfyd = 1; --whether update stats only for tables which got modified


if (@idx_st_only = 0 and @inc_sys_tbl = 1)
begin
      
       declare cur_update_stats cursor read_only for
       SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
       [s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
       ,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
       --,[ds].[modification_counter]
       FROM [sys].[stats] AS [s]
       JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
       OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
       WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
       and [ds].[modification_counter] > @is_modfyd
       ORDER BY [ds].[rows]

       print '-----it will update all stats of all objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'

end

if (@idx_st_only = 1 and @inc_sys_tbl = 0)
begin
      
       declare cur_update_stats cursor read_only for
       SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
       [s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
       ,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
       --,[ds].[modification_counter]
       FROM [sys].[stats] AS [s]
       JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
       OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
       WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
       and [s].[auto_created] = 0
       and [o].[type] = 'U'
       and [ds].[modification_counter] > @is_modfyd
       ORDER BY [ds].[rows]

       print '-----it will update index based stats for all user objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'

end

if (@idx_st_only = 0 and @inc_sys_tbl = 0)
begin
             
       declare cur_update_stats cursor read_only for
       SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
       [s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
       ,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
       --,[ds].[modification_counter]
       FROM [sys].[stats] AS [s]
       JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
       OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
       WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
       and [o].[type] = 'U'
       and [ds].[modification_counter] > @is_modfyd
       ORDER BY [ds].[rows]

       print '-----it will update all stats of user objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'

end
if (@idx_st_only = 1 and @inc_sys_tbl = 1)
begin
             
       declare cur_update_stats cursor read_only for
       SELECT OBJECT_NAME([s].[object_id]) AS "Tablename", SCHEMA_NAME([o].[schema_id]) AS "SchemaName",
       [s].[name] AS "Statname", [ds].[rows] AS "TotalNoOfRows"
       ,convert(decimal(18,4),((([ds].[modification_counter]*1.0) / ([ds].[rows]*1.0))*100.0)) AS "PercRowsModfyd"
       --,[ds].[modification_counter]
       FROM [sys].[stats] AS [s]
       JOIN [sys].[objects] AS [o] ON [o].[object_id] = [s].[object_id]
       OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [ds]
       WHERE [ds].[last_updated] <= DATEADD(hour,@hrs,GETDATE())
       and [s].[auto_created] = 0
       and [ds].[modification_counter] > @is_modfyd
       ORDER BY [ds].[rows]

       print '-----it will update all index based stats of all objects not updated in last ['+convert(varchar,(@hrs*-1))+'] hours-----'

end

open cur_update_stats;

fetch next from cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc;
--fetch next from cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc, @nrch; -- ##use this fetch statement if you want to print stats details as well

while @@FETCH_STATUS = 0
begin

       --print '--table : ' + @schn + '.' + @tbln + ' ; no of rows : ' + convert(varchar(max),@nrws)
       --            + ' ; no of rows changed : ' + convert(varchar(max),@nrch);
       If @mod_perc > 9.9         --Minimum number of rows percentage modified to be updated
       begin
             
              If @smp_rt is null
              begin
                     select @smp_rt = case
                                                when @nrws < 1000 then 100
                                                when @nrws between 1000 and 9999 then 90
                                                when @nrws between 10000 and 99999 then 80
                                                when @nrws between 100000 and 499999 then 70
                                                when @nrws between 500000 and 999999 then 60
                                                when @nrws between 1000000 and 4999999 then 50 --million rows and above logics starts here
                                                when @nrws between 5000000 and 9999999 then 45
                                                when @nrws between 10000000 and 49999999 then 40
                                                when @nrws between 50000000 and 99999999 then 35
                                                else 30 end ; --default sample percentage for all tables having 100 million+ records
              end

              set @cmd = N'Update statistics [' + @schn + '].[' + @tbln + '] ([' + @stsn + ']) with sample '
                                  + convert (nvarchar,@smp_rt) + ' percent';

              print @cmd; --print update stats command before execution
      
              exec sp_executesql @cmd;

       end

       fetch next from cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc;
       --fetch next from cur_update_stats into @tbln, @schn, @stsn, @nrws, @mod_perc, @nrch;

end

close cur_update_stats;

deallocate cur_update_stats;

--end of script


Hope you like this new and smart way of updating statistics. This can easily be put into a stored procedure with all the options as parameters and can be used to schedule a job as well.


Comments

Popular posts from this blog

SQL Server: SQL Server services not starting. TDSSNIClient initialization failed with error 0x139f, status code 0x80. Reason: Unable to initialize SSL support.

SQL Server 2014 SP\CU installation getting stuck at “MsiTimingAction”

SQL Server: Cluster Installation failed with error “Wait on the Database Engine recovery handle failed.”