Big Table and Small Table Join strategy in Oracle
Big Table and Small Table Join strategy in Oracle
The optimizer usesnested loop joins when joining small number of rows, with a good drivingcondition between the two tables. You drive from the outer loop to the innerloop, so the order of tables in the execution plan is important.
The outer loop isthe driving row source. It produces a set of rows for driving the joincondition. The row source can be a table accessed using an index scan or a fulltable scan. Also, the rows can be produced from any other operation. Forexample, the output from a nested loop join can be used as a row source for anothernested loop join.
The inner loop isiterated for every row returned from the outer loop, ideally by an index scan.If the access path for the inner loop is not dependent on the outer loop, thenyou can end up with a Cartesian product; for every iteration of the outer loop,the inner loop produces the same set of rows. Therefore, you should use other joinmethods when two independent row sources are joined together.
The following noworkload system statistics will be used:
SELECT
PNAME,
PVAL1
FROM
SYS.AUX_STATS$
ORDER BY
PNAME;
PNAME PVAL1
--------------- ----------
CPUSPEED
CPUSPEEDNW 2116.57559
DSTART
DSTOP
FLAGS 1
IOSEEKTIM 10
IOTFRSPEED 4096
MAXTHR
MBRC
MREADTIM
SLAVETHR
SREADTIM
STATUS
First the tables. We will start 易做图, with one table(T3) having 100 rows and another table (T4) having 10 rows:
CREATE TABLE T3 (
C1 NUMBER,
C2 NUMBER,
C3 NUMBER,
C4 VARCHAR2(20),
PADDING VARCHAR2(200));
CREATE TABLE T4 (
C1 NUMBER,
C2 NUMBER,
C3 NUMBER,
C4 VARCHAR2(20),
PADDING VARCHAR2(200));
INSERT INTO
T3
SELECT
ROWNUM C1,
1000000-ROWNUM C2,
MOD(ROWNUM-1,1000) C3,
TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
LPAD(' ',200,'A') PADDING
FROM
DUAL
CONNECT BY
LEVEL<=100;
INSERT INTO
T4
SELECT
ROWNUM C1,
1000000-ROWNUM C2,
MOD(ROWNUM-1,1000) C3,
TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY') C4,
LPAD(' ',200,'A') PADDING
FROM
DUAL
CONNECT BY
LEVEL<=10;
COMMIT;
CREATE INDEX IND_T3_C1 ON T3(C1);
CREATE INDEX IND_T3_C2 ON T3(C2);
CREATE INDEX IND_T3_C3 ON T3(C3);
CREATE INDEX IND_T4_C1 ON T4(C1);
CREATE INDEX IND_T4_C2 ON T4(C2);
CREATE INDEX IND_T4_C3 ON T4(C3);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)
We are able to easily produce an example where the “smallest”table is selected as the driving table (note that I had to add a hint tospecify a nested loop join in several of these examples):
SET AUTOTRACE TRACEONLY EXPLAIN
SELECT /*+ USE_NL(T3 T4) */
T3.C1,
T3.C2,
T3.C3,
T3.C4,
T4.C1,
T4.C2,
T4.C3,
T4.C4
FROM
T3,
T4
WHERE
T3.C1=T4.C1;
Execution Plan
----------------------------------------------------------
Plan hash value: 567778651
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 420 | 13 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 420 | 13 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T4 | 10 | 210 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_T3_C1 | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 21 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."C1"="T4"."C1")
If we stop at that point, we could declare quite simply that theoptimizer selects the smaller table as the driving table. But wait aminute, take a look at this example where the optimizer selected the largesttable as the driving table:
SELECT /*+ USE_NL(T3 T4) */
T3.C1,
T3.C2,
T3.C3,
T3.C4,
T4.C1,
T4.C2,
T4.C3,
T4.C4
FROM
T3,
T4
WHERE
T3.C1=T4.C1
AND T3.C2=1;
Execution Plan
----------------------------------------------------------
Plan hash value: 4214127300
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |