CBO学习----03--选择率（Selectivity）

CBO学习---第2章--表扫描（Tablescans）

http://www.zzzyk.com/database/201307/229069.html

第3章单表选择率（Single Table Selectivity）

Selectivity是优化器估算Rows（Cards）的重要依据。

/**************************************************************************************************************************************/
3.1 Getting Started
select count(*)
from audience
where month_no = 12
;

从优化器的角度，分析1200名听众中，估算12月出生的人数

(1)user_tab_col_statistics.num_distinct=12

(2)user_tab_histograms指出low(1)，high(12)，均匀分布

(3)user_tab_col_statistics.density=1/12

(4)month_no=12，单列，均一，所以user_tab_col_statistics.density可用

(5)low(1)<=12<=high(12)

(6)user_tab_col_statistics.num_nulls=0，没有空值

(7)user_tables.num_rows=1200

(8)1200/12=100

本章代码附件中：

birth_month_01.sql

hack_stats.sql

birth_month_01.sql构建表，先进行两次查询系统表，后做count(*)查询两次。

两次之间可在其他session执行hack_stats.sql，修改表的统计信息，看哪些统计项能影响rows的计算

(1)将表名和列名填入hack_stats.sql，并修改表的行数numrows

[sql] 
define m_source_table='AUDIENCE'  
define m_source_column='month_no'  
--m_avgrlen := m_avgrlen + 25;  
m_numrows:=m_numrows+1200;

交叉执行后，rows从100变成200

测试后将numrows复原

(2)修改distcnt和density

打开Column statistics的注释

[sql] 
--m_distcnt:=m_distcnt+12;  
m_density:=m_density/2;

经多次测试，distcnt的修改不起作用，说明仅有density参与计算（Oracle版本10204）。

并非像书中所提的那样，可能9i和10gR1中，没有直方图时是用distcnt来计算rows，Oracle之后的版本又改进了

测试后将density复原

/**************************************************************************************************************************************/
[sql] 
begin  
    dbms_stats.gather_table_stats(  
        user,  
        'audience',  
        cascade => true,  
        estimate_percent => null,  
        method_opt => 'for all columns size 1'  
    );  
end;  
/  

其中method_opt => 'for all columns size 1'，指不收集直方图，8i和9i的默认值
method_opt =>'for all columns size auto'10g默认值，可以通过下面方法读取
[sql] 
SQL> select dbms_stats.get_param('METHOD_OPT') from dual;  
  
  
DBMS_STATS.GET_PARAM('METHOD_OPT')  
---------------------------------------------------------  
FOR ALL COLUMNS SIZE AUTO  
  
  
已选择 1 行。  

/**************************************************************************************************************************************/

3.2 Null Values

将null加入到rows的计算中

假定10%的人不记得自己的生日在几月

本章代码附件中：

birth_month_02.sql

该脚本中，将120行设置为空值，最终rows=90

说明优化器排除了null对估算rows的影响；density依然是1/12，并没有改变，说明density是减去空值数量后的结果。

/**************************************************************************************************************************************/
3.3 Using Lists
select count(*)
from audience
where month_no in (6,7,8)
;

开始研究month_no in (6,7,8)，这种条件时，rows的计算方式

本章代码附件中：

in_list.sql

在没有重复值时，rows计算无误

/**************************************************************************************************************************************/
8i中计算错误的原因：
select count(*) from audience where month_no in (6,7,8);
select /* +use_concat */count(*) from audience where month_no in (6,7,8);


/**************************************************************************************************************************************/

本章代码附件中：

in_list_02.sql

将月份从12扩大到1000，density为0.001

获得列表3-1，用来对比8i到10g的计算值，版本间影响不大

/**************************************************************************************************************************************/
本章代码附件中：
oddities.sql
在存有重复值、空值、过界值时，计算rows的状况，10204明显有所改善，考虑了最大最小值（书中原版本未考虑过界值的问题）


条件 8i 92/10102 10204
month_no = 25  100 100 1
month_no in (4, 4) 100 100 100
month_no in (3, 25)  192 200 100
month_no in (3, 25, 26)  276 300 101
month_no in (3, 25, 25, 26)  276 300 101
month_no in (3, 25, null)  276 300 200
month_no in (:b1, :b2, :b3)  276 300 300
仅仅在有空值时，不能正确计算rows，但也比之前版本误差小。


/**************************************************************************************************************************************/

10104的线性衰变

oracle在10.2之前采用过一个更为复杂的线性衰变算法，即超过最大值或低于最小值时，单值rows的斜率为 "1/最大值-最小值"。

通过上面10204的测试，这个算法在新版本中被取代。

本章代码附件中：

in_list_10g.sql

在10104中的结果：

条件 10102 10104

month_no = 13 100 91

month_no = 15 100 73

month_no in (13,15) 200 164

month_no in (16,18) 200 109

10104比10102略有改进，10204比10104改进就更加明显了，虽然还有缺陷（null上），但说明oracle的opt团队在不断的更新代码。

/**************************************************************************************************************************************/
3.4 Range
本章代码附件中：
ranges.sql
总结各个版本，统计如下：
号 谓词 8i 92/10102 10204 数学表达式 真实值
1 month_no > 8  437 436 436 (8, 400
2 month_no >= 8  537 536 536 [8, 500
3 month_no < 8  764 764 764  ,8) 700
4 month_no <= 8  864 864 864  ,8] 800
5 month_no between 6 and 9  528 527 527 [6,9] 400
6 month_no >= 6 and month_no <= 9  528 527 527 [6,9] 400
7 month_no >= 6 and month_no < 9  428 427 427 [6,9) 300
8 month_no > 6 and month_no <= 9  428 427 427 (6,9] 300
9 month_no > 6 and month_no < 9  328 327 327 (6,9) 200
10 month_no > :b1  60 60 101 (:b1,            
11 month_no >= :b1  60 60 101 [:b1,            
12 month_no < :b1  60 60 101  ,:b1)           
13 month_no <= :b1  60 60 101  ,:b1]           
14 month_no between :b1 and :b2  4 3 9 [:b1,:b2]        
15 month_no >= :b1 and month_no <= :b2  4 3 9 [:b1,:b2]        
16 month_no >= :b1 and month_no < :b2  4 3 9 [:b1,:b2)        
17 month_no > :b1 and month_no < :b2  4 3 9 (:b1,:b2)        
18 month_no > :b1 and month_no <= :b2  4 3 9 (:b1,:b2]        
19 month_no > 12  100 100 100 (12, 0
20 month_no between 25 and 30  100 100 100 [25,30] 0

其中8i的4和9i的3，是没有区别的（10进制和二进制转换时有个小数），只是8i在计算式使用上浮，9i用的舍入

从上表发现的几条规律：

(1)"(" 和 "]"的差距为100

(2)超过边界值为100

(3)有绑定变量的rows，开闭区间并无影响

(4)10-13行，单个绑定变量的range，前期版本定为5%（1200*5%=60），10204之后较为复杂（有待研究）

(5)14-18行，5%*5%=2.5/1000，1200*5%*5%=3行

(6)19-20行，超出最

上一个：oracle AWR深入研究分析，如何使用
下一个：SQL_TRACE/10046事件如何使用，详细分析

更多图片编程知识:

更多Unix/Linux疑问解答：: 路由原理介绍; 子网掩码快速算法; 改变网络接口的速度和协商方式的工具miitool和ethtool; Loopback口的作用汇总; OSPF的童话; 增强的ACL修改功能; 三层交换机和路由器的比较; 用三层交换机组建校园网; 4到7层交换识别内容; SPARC中如何安装Linux系统(2); SPARC中如何安装Linux系统(1); 用Swatch做Linux日志分析; 实战多种Linux操作系统共存; 浅析Linux系统帐户的管理和审计; Linux2.6对新型CPU的支持(2)