当前位置:操作系统 > Unix/Linux >>

hash join、nested loop,sort merge join

hash join、nested loop,sort merge join
 
在oracle 执行计划中存在三种表的连接方式,hash join、nested loop,sort merge join
1、hash join
hash join
A join in which the database uses the smaller of two tables or data sources to build a hash table in memory. The database scans the larger table, probing the hash table for the addresses of the matching rows in the smaller table.
也就是说hash join是 在两个表中连接的时候存在的,散列连接是CBO 做大数据集连接时的常用方式,可以把一个小表或是数据源整合到内存中建立一个hash table,然后数据库开始扫描这个大表,使用内存中的hash table的地址来匹配外表中的数据行。
一个小的表开始joined一个大表的时候,oracle的hash join与nested loop join相比显的非常快。oracle对hash join占用的内存是有限制的,这个值是5%pga_aggregate_target。
The Oracle DBA controls the optimizers' propensity to invoke hash joins because the DBA must allocate the RAM resources to Oracle (using the hash_area_size and pga_aggregate_target parameters) for the optimizer to choose a hash join. The CBO will only choose a hash join if you have allocated Oracle enough RAM area in which to perform the hash join.
我们可以使用use_hash强制使用hash join
另外,oracle 的hash join需要很多的内存来建立hash表,为了提高效率,我们必须设置hash_area_size足够大,如果hash表占用的内存超出了hash_area_size的大小,那么就会分页到临时表空间,这会带来一定的消耗影响性能。
eg:
SQL> select count(*) from amy_emp;
COUNT(*)
----------
7340032
SQL> select count(*) from amy_dept;
COUNT(*)
----------
4
SQL> set autotrace on explain
SQL> select /*+use_hash(amy_emp,amy_dept*/ count(*) from amy_emp,amy_dept where amy_emp.deptno=amy_dept.deptno;
COUNT(*)
----------
7340032
Elapsed: 00:00:03.35
Execution Plan
----------------------------------------------------------
Plan hash value: 653649851
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | HASH JOIN | | 14 | 84 | 4 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | REVERSE_INDEX | 4 | 12 | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| AMY_EMP | 14 | 42 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("AMY_EMP"."DEPTNO"="AMY_DEPT"."DEPTNO")
SQL> select count(*) from amy_emp,amy_dept where amy_emp.deptno=amy_dept.deptno;
COUNT(*)
----------
7340032
Elapsed: 00:00:25.10
Execution Plan
----------------------------------------------------------
Plan hash value: 1472773524
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | NESTED LOOPS | | 14 | 84 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| AMY_EMP | 14 | 42 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | REVERSE_INDEX | 1 | 3 | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("AMY_EMP"."DEPTNO"="AMY_DEPT"."DEPTNO")
SQL>
从中,可以看出在这种情况下hash join运行的速度远远大于nested loops;
什么时间optimizer 使用 hash joins呢?
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:
1)A large amount of data must be joined.
2)A large fraction of a small table must be joined.
2、nested loop joins
这个东西叫嵌套循环链接,使用它的条件是
1)数据库获取的数据的集合不能太大
2)能够有效的访问内部表,也就是说内部表最好有索引。
看一下这句话:
It is important to ensure that the inner table is driven from (dependent on) the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop, degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.
它包含的步骤如下:
1)optimizer 决定驱动表或是外部表
2)optimizer决定外部表
3)内表被外表驱动,外表返回的每一行都要在内表中检索找到与它匹配的行
如果我们想强制使用nested loop joins可以使用use_nl(table1,table2)
eg:
SQL> set autotrace trace; 
SQL> r
1* select empno,ename,job,mgr from amy_emp,amy_dept where amy_emp.deptno=amy_dept.deptno and amy_emp.ename in('KING','TURNER')
Elapsed: 00:00:00.05
Execution Plan
----------------------------------------------------------
Plan hash value: 2114037428
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 56 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 2 | 56 | 3 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| AMY_EMP | 2 | 50 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | REVERSE_INDEX | 1 | 3 | 0 (0)| 00:00:01 |
-------------------------------------------
驱动表为amy_emp,使用use_nl转换驱动表;
注意:
use_nl(table1,table2)并不能让optimizer选择这两个表那个做为驱动表,如果想明确指定需要加ordered参数,对于ordered的参数指明from后边的表顺序为从左到右,也就是左边为驱动表,右边为被驱动表,另外如果use_nl参数仅仅指定一个表,那么这个表就为被驱动表,但指定的这个表为外部表那么则会忽略这个hint。
eg:
QL> select /*+use_nl(amy_emp,amy_dept)*/ empno,ename,job,mgr from amy_dept,amy_emp where amy_emp.deptno=amy_dept.deptno and amy_emp.ename in('KING','TURNER');
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2114037428
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 56 | 3 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 2 | 56 | 3 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| AMY_EMP | 2 | 50 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | REVERSE_INDEX | 1 | 3 | 0 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------
CopyRight © 2022 站长资源库 编程知识问答 zzzyk.com All Rights Reserved
部分文章来自网络,