oracle数据库hanganalyze


		oracle数据库hanganalyze
 
Oracle 数据库“真的”hang住了，可以理解为数据库内部发生死锁。因为普通的DML死锁，oracle服务器会自动监测他们的依赖关系，并回滚其中一个操作， 终止这种相互等待的局面。而当这种死锁发生在争夺内核级别的资源（比如说是pins或latches）时，Oracle并不能自动的监测并处理这种死锁。
其实很多时候数据库并没有hang住，而只是由于数据库的性能问题，处理的时间比较长而已。
Hanganalyze工具使用内核调用检测会话在等待什么资源，报告出占有者和等待者的相互关系。另外，它还会将一些比较”interesting”的进程状态dump出来，这个取决于我们使用hanganalyze的分析级别。
hanganalyze工具从oracle8i第二版开始提供，到9i增强了诊断RAC环境下的“集群范围”的信息，这意味着它将会报告出整个集群下的所有会话的信息。
目前有三种使用hanganalyze的方法:
一种是会话级别的：
SQL>ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level <level>';
一种是实例级别:
SQL>ORADEBUG hanganalyze <level>
一种是集群范围的：
SQL>ORADEBUG setmypid
SQL>ORADEBUG setinst all
SQL>ORADEBUG -g def hanganalyze <level>
各个level的含义如下：
1-2:只有hanganalyze输出，不dump任何进程
3：Level2+Dump出在IN_HANG状态的进程
4：Level3+Dump出在等待链里面的blockers(状态为LEAF/LEAF_NW/IGN_DMP)
5：Level4+Dump出所有在等待链中的进程（状态为NLEAF）
Oracle官方建议不要超过level 3，一般level 3也能够解决问题，超过level 3会给系统带来额外负担。
 
hanganalyze实验
 
1.session1更新行数据
SQL> connect scott/scott
Connected.
SQL> create table tb_hang(id number,remark varchar2(20));
Table created.
SQL> insert into tb_hang values(1,'test');
1 row created.
SQL> commit;
Commit complete.
SQL> select USERENV('sid') from dual;
USERENV('SID')
--------------
           146
SQL> update tb_hang set remark='hang' where id=1;
1 row updated.
这个时候不提交
2.session2同样更新session1更新的行
SQL> select USERENV('sid') from dual;            
USERENV('SID')
--------------
           154
SQL>  update tb_hang set remark='hang' where id=1;
这个时候已经hang住了
3.session3使用hangalyze生成trace文件
SQL> connect / as sysdba
Connected.
SQL> oradebug hanganalyze 3;
Hang Analysis in /u01/app/oracle/admin/oracl/udump/oracl_ora_3941.trc
4.查看trace文件的内容
$ more /u01/app/oracle/admin/oracl/udump/oracl_ora_3941.trc
/u01/app/oracle/admin/oracl/udump/oracl_ora_3941.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      hxl
Release:        2.6.18-8.el5xen
Version:        #1 SMP Fri Jan 26 14:42:21 EST 2007
Machine:        i686
Instance name: oracl
Redo thread mounted by this instance: 1
Oracle process number: 21
Unix process pid: 3941, image: oracle@hxl (TNS V1-V3)
*** SERVICE NAME:(SYS$USERS) 2012-06-16 01:13:29.241
*** SESSION ID:(144.14) 2012-06-16 01:13:29.241
*** 2012-06-16 01:13:29.241
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/146/5/0x7861b254/3858/SQL*Net message from client>
 -- <0/154/5/0x7861c370/3903/enq: TX - row lock contention>
Other chains found:
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/144/14/0x7861d48c/3941/No Wait>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/149/1/0x7861ced8/3806/Streams AQ: waiting for time man>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/151/1/0x7861c924/3804/Streams AQ: qmn coordinator idle>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/158/5/0x7861da40/3810/Streams AQ: qmn slave idle wait>
Extra information that will be dumped at higher levels:
[level  4] :   1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level  5] :   4 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level  6] :   1 node dumps -- [NLEAF]
[level 10] :  13 node dumps -- [IGN]
 
State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predec
essor):
[143]/0/144/14/0x786fa3dc/3941/SINGLE_NODE_NW/1/2//none
[145]/0/146/5/0x786fc944/3858/LEAF/3/4//153
[148]/0/149/1/0x78700160/3806/SINGLE_NODE/5/6//none
[150]/0/151/1/0x787026c8/3804/SINGLE_NODE/7/8//none
[153]/0/154/5/0x78705ee4/3903/NLEAF/9/10/[145]/none
[154]/0/155/1/0x78707198/3797/IGN/11/12//none
[155]/0/156/1/0x7870844c/3799/IGN/13/14//none
[157]/0/158/5/0x7870a9b4/3810/SINGLE_NODE/15/16//none
[159]/0/160/1/0x7870cf1c/3782/IGN/17/18//none
[160]/0/161/1/0x7870e1d0/3784/IGN/19/20//none
[161]/0/162/1/0x7870f484/3788/IGN/21/22//none
[162]/0/163/1/0x78710738/3786/IGN/23/24//none
[163]/0/164/1/0x787119ec/3774/IGN/25/26//none
[164]/0/165/1/0x78712ca0/3780/IGN/27/28//none
[165]/0/166/1/0x78713f54/3778/IGN/29/30//none
[166]/0/167/1/0x78715208/3776/IGN/31/32//none
[167]/0/168/1/0x787164bc/3770/IGN/33/34//none
[168]/0/169/1/0x78717770/3772/IGN/35/36//none
[169]/0/170/1/0x78718a24/3768/IGN/37/38//none
====================
END OF HANG ANALYSIS
====================
 
Trace文件内容的解释如下：
 
CYCLES: This section reports the process dependencies between sessions that are in a deadlock condition. Cycles are considered   “true” hangs.
 
Cycle 1 : <sid/sess_srno/proc_ptr/ospid/wait_event> :
 
    <980/3887/0xe4214964/24065/latch free>
 
 -- <2518/352/0xe4216560/24574/latch free>
 
 -- <55/10/0xe41236a8/13751/latch free>
 
BLOCKER OF MANY SESSIONS: This section is found when a process is blocking a lot of other sessions. Usually when a process is blocking more that 10 sessions this section will appear in the trace file.
 
Found 21 objects waiting for <sid/sess_srno/proc_ptr/ospid/wait_event>
 
    <55/10/0xe41236a8/13751/latch free>
 
Found 12 objects waiting for <sid/sess_srno/proc_ptr/ospid/wait_event>
 
    <2098/2280/0xe42870d0/3022/db file scattered read>
 
Found 12 objects waiting for <sid/sess_srno/proc_ptr/ospid/wait_event>