I received an email a couple of days ago that was a little different from usual – although the obvious answer was “it’s the data”. A connect by query with any one of several hundred input values ran in just a few seconds, but with one specific input it was still running 4,000 seconds later using the same execution plan – was this a bug ?
There’s nothing to suggest that it should be, with skewed data anything can happen: even a single table access by exact index could take 1/100th of a second to return a result if there was only one row matching the requirement and 1,000 seconds if there were 100,000 rows in 100,000 different table blocks (and the table was VERY big). The same scaling problem could be true of any type of query – and “connect by” queries can expose you to a massive impact because their run time can increase geometrically as the recursion takes place.
So it was easy to answer the question – no it’s (probably) not a bug, check the data for that one value.
Then I decided to build a simple model. The original email had a four table join, but I just created a single table, and used a “no filtering” connect by which I had to hint. Here’s some code I ran on 11.2.0.4:
rem rem script: connect_by_skew.sql rem dated: Feb 2016 rem Last tested: rem 12.1.0.2 rem create table t1 nologging as select rownum id_p, 10 * rownum id from all_objects where rownum <= 50000 ; execute dbms_stats.gather_table_stats(user,'t1', method_opt=>'for all columns size 1') alter system flush shared_pool; set serveroutput off alter session set statistics_level = all; select sum(ct) from ( select /*+ no_connect_by_filtering */ count(id) ct from t1 connect by id = 20 * prior id_p start with id_p = 1 group by id ) ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost')); update t1 set id_p = 0 where id_p = 1 ; update t1 set id_p = 1 where id_p > 45000 ; select sum(ct) from ( select /*+ no_connect_by_filtering */ count(id) ct from t1 connect by id = 20 * prior id_p start with id_p = 1 group by id ) ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));
The sum() of the inline aggregate view emulates the original code – I don’t know what it was for, possibly it was a way of demonstrating the problem without producing a large output, I just copied it.
As you can see in my script every parent id (id_p) starts out unique, and if I look at the pattern of the raw data identified by the recursion from id_p = 1 (rather than looiking at the result of the actual query) this is what I’d get:
ID_P ID ---------- ---------- 1 10 2 20 4 40 8 80 16 160 32 320 64 640 128 1280 256 2560 512 5120 1024 10240 2048 20480 4096 40960 8192 81920 16384 163840 32768 327680
When I modify the data so that I have exactly 5,000 rows with id_p = 1 the initial data generation will be 80,000 rows of data. If you want to try setting id_p = 1 for more rows make sure you do it to rows where id_p is already greater than 32768 or you’ll run into Oracle error ORA-01436: CONNECT BY loop in user data.
Here’s the execution plan, with rowsource execution stats I got for the first query (running 11.2.0.4):
----------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 32 (100)| 1 |00:00:00.44 | 103 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.44 | 103 | | | | | 2 | VIEW | | 1 | 2 | 32 (7)| 16 |00:00:00.44 | 103 | | | | | 3 | HASH GROUP BY | | 1 | 2 | 32 (7)| 16 |00:00:00.44 | 103 | 1519K| 1519K| 1222K (0)| |* 4 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | | 16 |00:00:00.44 | 103 | | | | | 5 | TABLE ACCESS FULL | T1 | 1 | 50000 | 31 (4)| 50000 |00:00:00.10 | 103 | | | | -----------------------------------------------------------------------------------------------------------------------------------------------------
As you can see, this took 0.44 seconds, generated the expected 16 rows (still visible up to operation 2) which it then counted. Oracle followed the same execution plan when I set 5,000 rows to the critical value – here’s the new run-time plan:
----------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 32 (100)| 1 |00:05:39.25 | 103 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:05:39.25 | 103 | | | | | 2 | VIEW | | 1 | 2 | 32 (7)| 5015 |00:05:39.24 | 103 | | | | | 3 | HASH GROUP BY | | 1 | 2 | 32 (7)| 5015 |00:05:39.22 | 103 | 5312K| 2025K| 1347K (0)| |* 4 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | | 80000 |00:05:38.56 | 103 | | | | | 5 | TABLE ACCESS FULL | T1 | 1 | 50000 | 31 (4)| 50000 |00:00:00.09 | 103 | | | | -----------------------------------------------------------------------------------------------------------------------------------------------------
As expected, 80,000 rows generated (5,000 * 16), aggregated down to 5,015, then aggregated again to the one row result. Time to complete: 5 minutes 39 seconds – and it was all CPU time. It’s not entirely surprising – a single recursive descent (with startup overheads) took 0.44 seconds – presumably a fairly large fraction of that was startup, but even 0.1 seconds adds up if you do it 5,000 times.
Everybody knows that skewed data can produced extremely variable response times. With a deeper tree and more rows with the special value it wouldn’t be hard for the total run time of this query to get to the 4,000 seconds reported in the original email. (I also tried running with 10,000 rows set to 1 and the run time went up to 18 minutes – of which a large fraction was reading from the TEMPORARY tablespace because something had overflowed to disc).
Was there a solution ?
I don’t know – but I did suggest two options
a) create a histogram on the data to show that there was one particular special value; since the code seemed to include literals perhaps the optimizer would notice the special case and choose a different plan.
b) hint the code to use a different strategy – the hint would be /*+ connect_by_filtering */. Here’s the resulting execution plan:
--------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 95 (100)| 1 |00:00:06.50 | 1751 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:06.50 | 1751 | | | | | 2 | VIEW | | 1 | 2 | 95 (6)| 5015 |00:00:06.49 | 1751 | | | | | 3 | HASH GROUP BY | | 1 | 2 | 95 (6)| 5015 |00:00:06.47 | 1751 | 5312K| 2025K| 1346K (0)| | 4 | CONNECT BY WITH FILTERING| | 1 | | | 80000 |00:00:06.30 | 1751 | 337K| 337K| 299K (0)| |* 5 | TABLE ACCESS FULL | T1 | 1 | 1 | 31 (4)| 5000 |00:00:00.01 | 103 | | | | |* 6 | HASH JOIN | | 16 | 1 | 63 (5)| 15 |00:00:05.98 | 1648 | 1969K| 1969K| 741K (0)| | 7 | CONNECT BY PUMP | | 16 | | | 16 |00:00:00.01 | 0 | | | | | 8 | TABLE ACCESS FULL | T1 | 16 | 50000 | 31 (4)| 800K|00:00:01.49 | 1648 | | | | ---------------------------------------------------------------------------------------------------------------------------------------
We get the result in 6.5 seconds! [UPDATE: but there’s a nice explanation for that – most of the time comes from the work done gathering rowsource execution statistics; with statistics_level set back to typical the run time dropped to 0.19 seconds.]
