Oracle database statistics, gathering, speed and activity redo log

Each new version of the optimizer is increasingly dependent on the statistics, the quality of which may be sufficient for an earlier version but not the current.

As a rule be analyzed a table (in casacada, ie including indexes) significant modifications are made upon it. For example:

  • truncates
  • Insert into .. select …
  • Any insert, update and deletion bulk (bulk means that affects more than 20% of all records)

The statistics can be obtained in many ways, some faster than others (partial estimates, complete, etc …) and more or less redologs generation.

For example if we use the DBMS_STATS package (https://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_stats.htm#ARPLS059) specifically the sub-program GATHER_TABLE_STATS (https://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_stats.htm#ARPLS68582), the syntax is very similar to the sub-programs:

  • DBMS_STATS.GATHER_DATABASE_STATS
  • DBMS_STATS.GATHER_SCHEMA_STATS

It is possible to harvest statistics quickly and generate low activity in redo for example:

Source   
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SCOTT', tabname=>'EMP',estimate_percent=> 10, method_opt=> 'FOR ALL INDEXED COLUMNS SIZE 1', cascade=>TRUE);
END;
/

In the previous call statistics are computed only in indexed columns, and histograms 1 single bucket (which is the same as not having histograms, the optimizer needs to function optimally), with an estimate of 10%.

Any query that includes a field (in conditions) without analyzing a table can be a disaster in terms of performance, the optimizer has no information histogram and also estimate the percentage is too low (10%, a good value it is 25% or better leave it on automatic).

An analysis of higher quality would be with a call:

Source   
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (ownname=> 'SCOTT', tabname=>'EMP',estimate_percent=> dbms_stats.auto_sample_size, method_opt=> 'FOR ALL COLUMNS SIZE AUTO', cascade=>TRUE);
END;
/

In this case all fields of the table are analyzed and the server leaves the choice of the size of the histogram (buckets), is slower because you have to analyze more fields and generates as 4 times more active in the redo due to histograms.

Leave a Reply