This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:
rem rem Script: opt_est_nlj.sql rem Author: Jonathan Lewis rem Dated: Aug 2017 rem create table t1 as select trunc((rownum-1)/15) n1, trunc((rownum-1)/15) n2, rpad(rownum,180) v1 from dual connect by level <= 3000 --> hint to avoid wordpress format issue ; create table t2 pctfree 75 as select mod(rownum,200) n1, mod(rownum,200) n2, rpad(rownum,180) v1 from dual connect by level <= 3000 --> hint to avoid wordpress format issue ; create index t1_i1 on t1(n1); create index t2_i1 on t2(n1);
There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):
select /*+ qb_name(main) */ t1.v1, t2.v1 from t1, t2 where t1.n1 = 15 and t2.n1 = t1.n2 ; ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 225 | 83700 | 44 (3)| 00:00:01 | |* 1 | HASH JOIN | | 225 | 83700 | 44 (3)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 15 | 2805 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 3000 | 541K| 42 (3)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T2"."N1"="T1"."N2") 3 - access("T1"."N1"=15)
You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:
alter session set "_nlj_batching_enabled"=0; select /*+ qb_name(main) leading(t1 t2) use_nl(t2) index(t2) no_nlj_prefetch(t2) */ t1.v1, t2.v1 from t1, t2 where t1.n1 = 15 and t2.n1 = t1.n2 ; ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 225 | 83700 | 242 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 225 | 83700 | 242 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 15 | 2805 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 15 | 2775 | 16 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | T2_I1 | 15 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."N1"=15) 5 - access("T2"."N1"="T1"."N2")
I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t affect the arithmetic but it does mean the appearance of the nested loop goes back to the original pre-9i form that happens to make it a little easier to see costs and cardinalities adding and multiplying their way through the plan.
As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2: the optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15*15 = 255 and the cost comes to 15 (t1 rows) * 16 (t2 unit cost) + 2 (t1 cost) = 242.
So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.
select /*+ qb_name(main) leading(t1 t2) use_nl(t2) index(t2) no_nlj_prefetch(t2) opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06) */ t1.v1, t2.v1 from t1, t2 where t1.n1 = 15 and t2.n1 = t1.n2 ; ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 225 | 83700 | 32 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 225 | 83700 | 32 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 15 | 2805 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 15 | 2775 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | T2_I1 | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."N1"=15) 5 - access("T2"."N1"="T1"."N2")
I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).
The form is: (@qb_name nlj_index_scan, table_alias (list of possible driving tables), target_index, numeric_adjustment).
The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.
The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is 15 (t1 rows) * 2 (t2 unit cost) + 2 (t1 cost).
Let’s try to adjust the predication that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).
select /*+ qb_name(main) leading(t1 t2) use_nl(t2) index(t2) no_nlj_prefetch(t2) opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06) opt_estimate(@main table , t2@main , scale_rows=0.20) */ t1.v1, t2.v1 from t1, t2 where t1.n1 = 15 and t2.n1 = t1.n2 ; ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 47 | 17484 | 32 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 47 | 17484 | 32 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 15 | 2805 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 3 | 555 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | T2_I1 | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."N1"=15) 5 - access("T2"."N1"="T1"."N2")
By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decode to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality in operation 1 has also dropped from 225 to 47 – if the optimizer is told that it’s cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.
Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.
Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:
select /*+ qb_name(main) leading(t1 t2) use_nl(t2) index(t2) no_nlj_prefetch(t2) opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06) opt_estimate(@main table , t2@main , scale_rows=0.20) opt_estimate(@main join(t2 t1) , scale_rows=0.5) */ t1.v1, t2.v1 from t1, t2 where t1.n1 = 15 and t2.n1 = t1.n2 ; ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 23 | 8556 | 32 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 23 | 8556 | 32 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 15 | 2805 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 2 | 370 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | T2_I1 | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."N1"=15) 5 - access("T2"."N1"="T1"."N2")
I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).
Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins.
Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints).
select /*+ qb_name(main) no_nlj_prefetch(t2) opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06) opt_estimate(@main table , t2@main , scale_rows=0.20) opt_estimate(@main join(t2 t1) , scale_rows=0.5) */ t1.v1, t2.v1 from t1, t2 where t1.n1 = 15 and t2.n1 = t1.n2 ; ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 23 | 8556 | 32 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 23 | 8556 | 32 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 15 | 2805 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | T1_I1 | 15 | | 1 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 2 | 370 | 2 (0)| 00:00:01 | |* 5 | INDEX RANGE SCAN | T2_I1 | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."N1"=15) 5 - access("T2"."N1"="T1"."N2") Note ----- - this is an adaptive plan
WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.
In fact, in this example, the plan we wanted because desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.
Closing Comment
As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.