A posting on the OTN database forum a few days ago demonstrated an important problem with hinting – especially (though it didn’t come up in the thread) in the face of upgrades. A simple query needed a couple of hints to produce the correct plan, but a slight change to the query seemed to result in Oracle ignoring the hints. The optimizer doesn’t ignore hints, of course, but there are many reasons why it might have appeared to so I created a little demonstration of the problem – starting with the following data set:
rem rem Script: OTN_DAG.sql rem Author: J.P.Lewis rem Dated: March 2016 rem create table t1 nologging as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select mod(rownum,200) n1, mod(rownum,200) n2, rpad(rownum,180) v1 from generator g1, generator g2 where rownum <= 24000 ; create table t2 nologging as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select trunc((rownum-1)/15) n1, trunc((rownum-1)/15) n2, rpad(rownum,180) v1 from generator where rownum <= 3000 ;
begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); dbms_stats.gather_table_stats( ownname => user, tabname =>'T2', method_opt => 'for all columns size 1' ); end; /
(Ignore the silliness of the way I’ve created the data, it’s a consequence of using my standard template).
For every row in t2 there are 8 rows in t1, so when I join t1 to t2 on n2 it would obviously be sensible for the resulting hash join to use the t2 (smaller) data set as the build table and the t1 data set as the probe table, but I’m going to pretend that the optimizer is making an error and needs to be hinted to use t1 as the build table and t2 as the probe. Here’s a query, and execution plan, from 11.2.0.4:
explain plan for select /*+ leading(t1) use_hash(t2) no_swap_join_inputs(t2) */ count(t1.n2) from t1, t2 where t2.n2 = t1.n2 and t1.n1 = 15 and t2.n1 = 15 ; select * from table(dbms_xplan.display(null,null,'outline alias')); ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 97 (3)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 16 | | | |* 2 | HASH JOIN | | 20 | 320 | 97 (3)| 00:00:01 | |* 3 | TABLE ACCESS FULL| T1 | 120 | 960 | 85 (3)| 00:00:01 | |* 4 | TABLE ACCESS FULL| T2 | 15 | 120 | 12 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T2"."N2"="T1"."N2") 3 - filter("T1"."N1"=15) 4 - filter("T2"."N1"=15)
As you can see, the optimizer has obeyed my hinting – the join order is t1 -> t2, I’ve used a hash join to join t2, and Oracle hasn’t swapped the join inputs despite the fact that the t1 data set is larger than the t2 data set (960 bytes vs. 120 bytes) which should have persuaded it to swap. (Technically, the leading() hint seems to block the swap of the first two tables anyway – see the “Special Case” section at this URL, but I’ve included it the no_swap_join_inputs() anyway to make the point explicit.)
So now, instead of just count n2, we’ll modify the query to count the number of distinct values for n2:
explain plan for select /*+ leading(t1) use_hash(t2) no_swap_join_inputs(t2) */ count(distinct t1.n2) from t1, t2 where t2.n2 = t1.n2 and t1.n1 = 15 and t2.n1 = 15 ; select * from table(dbms_xplan.display(null,null,'outline alias')); ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 98 (4)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 13 | | | | 2 | VIEW | VW_DAG_0 | 20 | 260 | 98 (4)| 00:00:01 | | 3 | HASH GROUP BY | | 20 | 320 | 98 (4)| 00:00:01 | |* 4 | HASH JOIN | | 20 | 320 | 97 (3)| 00:00:01 | |* 5 | TABLE ACCESS FULL| T2 | 15 | 120 | 12 (0)| 00:00:01 | |* 6 | TABLE ACCESS FULL| T1 | 120 | 960 | 85 (3)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T2"."N2"="T1"."N2") 5 - filter("T2"."N1"=15) 6 - filter("T1"."N1"=15)
Check operations 5 and 6 – Oracle has swapped the join inputs: t2 (the obvious choice) is now the build table. Has Oracle ignored the hint ? (Answer: No).
If you look at operation 2 you can see that Oracle has generated an internal view called VW_DAG_0 – this is an example of the “Distinct Aggregate” transformation taking place. It seems to be a pointless exercise in this case and the 10053 trace file seems to indicate that it’s a heuristic transformation rather than cost-based transformation (i.e. the optimizer does it because it can, not because it’s cheaper). Oracle has transformed the SQL to the following (to which I have applied a little cosmetic tidying):
SELECT /*+ LEADING (T1) */ COUNT(VW_DAG_0.ITEM_1) "COUNT(DISTINCTT1.N2)" FROM ( SELECT T1.N2 ITEM_1 FROM TEST_USER.T2 T2,TEST_USER.T1 T1 WHERE T2.N2=T1.N2 AND T1.N1=15 AND T2.N1=15 GROUP BY T1.N2 ) VW_DAG_0
Notice how the use_hash() and no_swap_join_input() hints have disappeared. I am slightly surprised that the leading() hint is still visible, I would have expected all three to stay or all three to disappear; regardless of that, though, the single remaining hint references an object that does not exist in the query block where the hint has been placed. The original hint has not been “ignored”, it has become irrelevant. (I’ll be coming back to an odd little detail about this transformed query a little later on but for the moment I’m going to pursue the problem of making the optimizer do what we want.)
We have three strategies we could pursue at this point. We could tell the optimizer that we don’t want it to do the transformation; we could work out the query block name of the query block that holds t1 and t2 after the transformation and direct the hints into that query block; or we could tell Oracle to pretend it was using an older version of the optimizer because that Distinct Aggregate transformation only appeared in 11.2.0.1.
You’ll notice that I used the ‘alias’ formatting command in my call to dbms_xplan.display() – this is the queryblock / alias section of the output:
Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$C33C846D 2 - SEL$5771D262 / VW_DAG_0@SEL$C33C846D 3 - SEL$5771D262 5 - SEL$5771D262 / T1@SEL$1 6 - SEL$5771D262 / T2@SEL$1
Strategy A says try adding the hint: /*+ no_transform_distinct_agg(@sel$1) */
Strategy B says try using the hints: /*+ leading(@SEL$5771D262 t1@sel$1 t2@sel$1) use_hash(@SEL$5771D262 t2@sel$1 no_swap_join_inputs(@SEL$5771D262 t2@sel$1) */
Strategy C says try adding the hint: /*+ optimizer_features_enable(‘11.1.0.7’) */
Strategies A and C (stopping the transformation) produce the following plan:
---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 98 (4)| 00:00:01 | | 1 | SORT GROUP BY | | 1 | 16 | | | |* 2 | HASH JOIN | | 20 | 320 | 98 (4)| 00:00:01 | |* 3 | TABLE ACCESS FULL| T1 | 120 | 960 | 85 (3)| 00:00:01 | |* 4 | TABLE ACCESS FULL| T2 | 15 | 120 | 12 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T2"."N2"="T1"."N2") 3 - filter("T1"."N1"=15) 4 - filter("T2"."N1"=15)
Strategy B (allowing the transformation, but addressing the hints to the generated query block) produces this plan:
---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 98 (4)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 13 | | | | 2 | VIEW | VW_DAG_0 | 20 | 260 | 98 (4)| 00:00:01 | | 3 | HASH GROUP BY | | 20 | 320 | 98 (4)| 00:00:01 | |* 4 | HASH JOIN | | 20 | 320 | 97 (3)| 00:00:01 | |* 5 | TABLE ACCESS FULL| T1 | 120 | 960 | 85 (3)| 00:00:01 | |* 6 | TABLE ACCESS FULL| T2 | 15 | 120 | 12 (0)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T2"."N2"="T1"."N2") 5 - filter("T1"."N1"=15) 6 - filter("T2"."N1"=15)
All three Strategies have produced plans that use t1, the larger data set, as the build table. It’s hard to resist asking if it’s possible to claim that one of the three strategies is the best strategy; it’s hard to say, but I think I’d favour using the no_transform_distinct_agg() hint because it’s precisely targetted – so avoids the brute force thuggish nature of the reverting back to an old version, and avoids the (possble) fragility of needing to know a very precise query block name which (possibly) might change for some reason if the query were to be modified very slightly. The argument, of course, comes from the perspective of a friendly consultant who visits for a couple of days, gets a bit clever with your SQL, then walks away leaving you to worry about whether you understand why your SQL now works the way it does.
Upgrades
My opening comment was about the difficulty of hinting across upgrades. Imagine you had been running this count(distinct) query in 10.2.0.5, and after some experimention had found that you got the path you needed by adding the hints: /*+ leading(t1 t2) full(t1) use_hash(t2) no_swap_join_inputs(t2) full(t2) */. This is a careful and thorough piece of hinting (and it does work, of course, in 10.2.0.5).
When the big day for upgrading to 11.2 arrives (just in time for Oracle to ends extended support, possibly) you find that this query changes its execution plan. And this is NOT a rare occurrence. I’ve said it before, and I’ll keep saying it: hinting – especially with “micro-management” hints – is undesirable in a production system. You probably haven’t done it right, and even if the hints are (broadly speaking) perfect in the current version they may be pushed out of context by a new feature in the next version. If you’ve hinted your code you have to check every single hinted statement to make sure the hints still have the same effect on the upgrade.
This is why I produce the sound-bite (which Maria Colgan nicked): “if you can hint it, baseline it”. If you had generated a baseline (or outline) from a query with these hints in 10g Oracle would have included the /*+ optimizer_features_enable(‘10.2.0.5’) */ hint with the functional hints, and the upgrade wouldn’t have produced a different plan.
Technically, of course, you could have remembered to add the hint to your production code – but in many cases Oracle introduces far more hints in an SQL Baseline than you might want to put into your code; and by using the SQL Baseline approach you’ve given yourself the option to get rid of the “hidden hinting” in a future version of Oracle by dropping the baseline rather than rewriting the code and (perhaps) recompiling the application.
Inevitably there are cases where setting the optimizer_features_enable backwards doesn’t rescue new from a new plan – there are probably a few cases where the internal code forgets to check the value and bypass some subroutines; more significantly there are cases where one version of Oracle will give you an efficient plan because of an optimizer bug and setting the version backwards won’t re-introduce that bug.
Footnote
I said I’d come back to the “unparsed” query that the optimizer generated from the original count(distinct) statement and the way it left the leading(t1) hint in place but lost the use_hash(t2) and no_swap_join_inputs(t2). I got curious about how Oracle would optimize that query if I supplied it from SQL*Plus – and this is the plan I got:
explain plan for SELECT /*+ LEADING (T1) */ COUNT(VW_DAG_0.ITEM_1) "COUNT(DISTINCTT1.N2)" FROM ( SELECT T1.N2 ITEM_1 FROM TEST_USER.T2 T2,TEST_USER.T1 T1 WHERE T2.N2=T1.N2 AND T1.N1=15 AND T2.N1=15 GROUP BY T1.N2 ) VW_DAG_0 ; ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 98 (4)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 13 | | | | 2 | VIEW | VM_NWVW_0 | 20 | 260 | 98 (4)| 00:00:01 | | 3 | HASH GROUP BY | | 20 | 320 | 98 (4)| 00:00:01 | |* 4 | HASH JOIN | | 20 | 320 | 97 (3)| 00:00:01 | |* 5 | TABLE ACCESS FULL| T1 | 120 | 960 | 85 (3)| 00:00:01 | |* 6 | TABLE ACCESS FULL| T2 | 15 | 120 | 12 (0)| 00:00:01 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T2"."N2"="T1"."N2") 5 - filter("T1"."N1"=15) 6 - filter("T2"."N1"=15)
Oracle has managed to do a transformation to this statement that it didn’t do when it first generated the statement – too much recursion, perhaps – and that floating leading(t1) hint has been squeezed back into action by a view-merging step in the optimization that got the hint back into a query block that actually contained t1 and t2! At this point I feel like quoting cod-philosophy from the Dune trilogy: “Just when you think you understand …”
