GoldenGate Data Pump进程report报WARNING OGG-01223 Cannot find executable file './server'无法启动解决
今天遇到一怪问题,纳服的哥们又打过来说我们的 data pump 进程没启动,导致数据无法同步到对端,
因为以往 data pump 进程没启动原因很简单,总是报 WARNING OGG-01223 TCP/IP error 146 (Connection refused).
出现这种错误,要么是因为网络不通,要么是因为对端的 manager 进程未启动。
和纳服哥们一起核实后发现该对端进程正常运行,且网络也是通的,自己还不信,登上对端的机器发现,的确如其所说。
尝试重启本地的 data pump 进程发现,进程的 lag 只增不减,进程 report 和 ggserr 日志中不断输出
WARNING OGG-01223 Cannot find executable file './server'. 错误信息
GGSCI (bjyschxzg1) 4> view report PZJ_NF1
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
2013-06-19 00:01:30 WARNING OGG-01223 Cannot find executable file './server'.
bjyschxzg1:/home/oracle/ggs$tail -f ggserr.log
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:47 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:48 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:48 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
2013-06-19 00:15:48 WARNING OGG-01223 Oracle GoldenGate Capture for Oracle, pzj_nf6.prm: Cannot find executable file './server'.
GoldenGate 中 Data Pump 进程只负责将 redo 数据流传输到 target 端,并不负责将 redo 数据流写入 target 的本地磁盘文件,这部分工作是由
target 端的 mgr 进程自动启动的 collector 进程负责。
官方文档如下描述 collector 进程:
The Collector process operates on the target system to receive incoming data and write it to the trail.
Dynamic Collector
Typically, Oracle GoldenGate users do not interact with the Collector process. It is started
dynamically by the Manager process. This is known as a dynamic collector.
Static Collector
You can run a static Collector manually by running the SERVER program at the command
line with the following syntax and input parameters as shown:
server <parameter> [<parameter>] [...]
由于我们的环境中都是使用动态的 collector,正常情况下 mgr 启动时会调用 ggs 实例的 home 目录下的 server 二进制文件启动 collector 进程。
ggs home 下的 server 二进制文件
bjyscsjqz:/home/oracle/ggs$ls -lt server
-rwxr-x---. 1 oracle oinstall 13757119 Aug 24 2012 server
mgr 调用 ggs home 下的 server 二进制文件启动的 collector 进程
localhost.localdomain:/home/oracle$ps -ef | grep goldengate | grep -v grep | grep server
oracle 11035 10883 0 11:00 ? 00:00:02 ./server -w 300 -p 7815-8000 -m 7809 -k -l /goldengate/ggs/ggserr.log
登录目标端的纳服数据库主机发现目标端的 ggs home下无该 server 二进制文件
bjyscnfdbnfzc01:/home/oracle/ggs$ls -lt server
ls: cannot access server: No such file or directory
同时尽管 mgr 进程已经启动,但实际并未启动 collector 进程,这就是源端的 data pump 进程报错并挂起的原因。
bjyscnfdbnfzc01:/home/oracle/ggs$ps -ef | grep goldengate
oracle 29790 28264 34 15:34 ? 00:09:27 ./mgr PARAMFILE /oracle/oradata4/goldengate/dirprm/mgr.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/MGR.rpt PROCESSID MGR PORT 7809
oracle 29794 29790 0 15:34 ? 00:00:01 /oracle/oradata4/goldengate/extract PARAMFILE /oracle/oradata4/goldengate/dirprm/extzj_mh.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/EXTZJ_MH.rpt PROCESSID EXTZJ_MH USESUBDIRS
oracle 29803 29790 0 15:34 ? 00:00:01 /oracle/oradata4/goldengate/extract PARAMFILE /oracle/oradata4/goldengate/dirprm/pmpzj_mh.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/PMPZJ_MH.rpt PROCESSID PMPZJ_MH USESUBDIRS
oracle 29807 29790 0 15:34 ? 00:00:08 /oracle/oradata4/goldengate/replicat PARAMFILE /oracle/oradata4/goldengate/dirprm/rzj_nf1.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/RZJ_NF1.rpt PROCESSID RZJ_NF1 USESUBDIRS
oracle 29812 29790 0 15:34 ? 00:00:03 /oracle/oradata4/goldengate/replicat PARAMFILE /oracle/oradata4/goldengate/dirprm/rzj_nf6.prm REPORTFILE /oracle/oradata4/goldengate/dirrpt/RZJ_NF6.rpt PROCESSID RZJ_NF6 USESUBDIRS
oracle 31564 31146 0 16:01 pts/2 00:00:00 grep goldengate
针对这个问题,MOS 文章 [ID 1550203.1] 对其原因描述如下:
Cause
message could be caused by
- "server" binary in TARGET $GG_HOME is missing or has the incorrect permissions
- GoldenGate manager in TARGET environment is unable to launch new server collector processes (hung process)
Solution
1. Check "server" binary is located in TARGET $GG_HOME and with correct permissions like:
[ogg@gglnx1 gg]$ ls -lrt server
-rwxr-x---. 1 ogg ogg 13619841 Apr 3 23:32 server
2. Stop/start GoldenGate manager on TARGET environment
3. Start remote Data Pump Extract(s).