Following on from the short note I published about the first_rows optimizer mode yesterday here’s a note that I wrote on the topic more than 2 years ago but somehow forgot to publish.
I can get quite gloomy when I read some of the material that gets published about Oracle; not so much because it’s misleading or wrong, but because it’s clearly been written without any real effort being made to check whether it’s true. For example, a couple of days ago [ed: actually some time around May 2012] I came across an article about optimisation in 11g that seemed to be claiming that first_rows optimisation somehow “defaulted” to first_rows(1) , or first_rows_1, optimisation if you didn’t supply a final integer value.
For at least 10 years the manuals have described first_rows (whether as a hint or as a parameter value) as being available for backwards compatibility; so if it’s really just a synonym for first_rows_1 (or first_rows(1)) you might think that the manuals would actually mention this. Even if the manuals didn’t mention it you might just consider a very simple little test before making such a contrary claim, and if you did make such a test and found that your claim was correct you might actually demonstrate (or describe) the test so that other people could check your results.
It’s rather important, of course, that people realise (should it ever happen) that first_rows has silently changed into first_rows_1 because any code that’s using it for backwards compatibility might suddenly change execution path when you did the critical upgrade where the optimizer changed from “backwards compatibility” mode to “completely different optimisation strategy” mode. So here’s a simple check (run from 11.2.0.4 – to make sure I haven’t missed the switch):
begin dbms_stats.set_system_stats('MBRC',16); dbms_stats.set_system_stats('MREADTIM',10); dbms_stats.set_system_stats('SREADTIM',5); dbms_stats.set_system_stats('CPUSPEED',1000); end; / create table t2 as select mod(rownum,200) n1, mod(rownum,200) n2, rpad(rownum,180) v1 from all_objects where rownum <= 3000 ; begin dbms_stats.gather_table_stats( user, 't2', method_opt => 'for all columns size 1' ); end; / create index t2_i1 on t2(n1); SQL> select /*+ all_rows */ n2 from t2 where n1 = 15; -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 15 | 120 | 12 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T2 | 15 | 120 | 12 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("N1"=15)
You’ll notice that I’ve created my data in a way that means I’ll have 15 rows with the value 15, scattered evenly through the table. As a result of the scattering the clustering_factor on my index is going to be similar to the number of rows in the table, and the cost of fetching all the rows by index is going to be relatively high. Using all_rows optimization Oracle has chosen a tablescan.
So what happens if I use the first_rows(1) hint, and how does this compare with using the first_rows hint ?
SQL> select /*+ first_rows(1) */ n2 from t2 where n1 = 15; ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 16 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 2 | 16 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T2_I1 | | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N1"=15) SQL> select /*+ first_rows */ n2 from t2 where n1 = 15; ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 15 | 120 | 16 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 15 | 120 | 16 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T2_I1 | 15 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N1"=15)
You might not find it too surprising to see that Oracle used the indexed access path in both cases. Does this mean that first_rows really means (or defaults to) first_rows(1) ?
Of course it doesn’t - you need only look at the estimated cost and cardinality to see this. The two mechanisms are clearly implemented through difference code paths. The first_rows method uses some heuristics to restrict the options it examines, but still gives us the estimated cost and cardinality of fetching ALL the rows using the path it has chosen. The first_rows(1) method uses arithmetic to decide on the best path for getting the first row, and adjusts the cost accordingly to show how much work it thinks it will have to do to fetch just that one row.
Of course, no matter how inane a comment may seem to be, there’s always a chance that it might be based on some (unstated) insight. Is there any way in which first_rows(n) and first_rows are related ? If so could you possibly manage to claim that this establishes a “default value” link?
Funnily enough there is a special case: if you try hinting with first_rows(0) – that’s the number zero – Oracle will use the old first_rows optimisation method – you can infer this from the cost and cardinality figures, or you can check the 10053 trace file, or use a call to dbms_xplan() to report the outline. It’s an interesting exercise (left to the reader) to decide whether this is the lexical analyzer deciding to treat the “(0)” as a new – and meaningless – token following the token “first_rows”, or whether it is the optimizer recognising the lexical analyzer allowing “first_rows(0)” as a token which the optimizer is then required to treat as first_rows.
Mind you, if you only want the first zero rows of the result set there’s a much better trick you can use to optimise the query – don’t run the query.
