Erlang R14以及之前版本的global模块的隐患
问题的引起:
最近同事的一个项目遇到一个奇怪的问题,代码如下:
G节点上运行类似如下代码:[代码片段1]
[begin
global:send(role_manager, {role_online,RoleID}),
gen_server:call({global, account_server}, {create_role,A}
end|| _<-lists:seq(1,1000)].
role_manager, account_server都跑在W节点上
role_manager收到{role_online, RoleID}消息的处理:[代码片段2]www.zzzyk.com
start_child(role_sup, {role_server, {role_server, start_link, []},transient,30000, worker, [role_server]})
role_server的start_link: [代码片段3]
gen_server:start_link({global, role_XXXX}, role_server, [], []).
代码段1是很简单的一段代码,但是非常奇怪的就是这段代码居然用了N秒,注意是秒!
翻了源码没问题(蛋疼,我直接翻看的是R15B01的源码),后来跟同事聊天忽然想起来R15更新日志中优化了关于safe_whereis_name的代码,赶快去看看R14B02的源码:
两个文件gen.erl(gen_server:call最终会走到这里)和global.erl
[gen.erl]:
%% Global by name
call({global, _Name}=Process,Label,Request,Timeout)
whenTimeout=:=infinity;
is_integer(Timeout),Timeout>=0->
casewhere(Process)of
Pidwhenis_pid(Pid)->
Node=node(Pid),
trydo_call(Pid,Label,Request,Timeout)
catch
exit:{nodedown,Node}->
%% A nodedown not yet detected by global,
%% pretend that it was.
exit(noproc)
end;
undefined->
exit(noproc)
end;
global名字首先where查询PID,发现用的是global:safe_whereis_name(Name)
where({global, Name}) -> global:safe_whereis_name(Name);
where({local, Name}) -> whereis(Name).
跳转到[global.erl]:
-spec safe_whereis_name(term()) -> pid() | 'undefined'.
safe_whereis_name(Name) ->
gen_server:call(global_name_server, {whereis, Name}, infinity).
首先是call,但是这个肯定不是性能低下的关键,再往下看:
handle_call({whereis,Name},From, S)->
do_whereis(Name,From),
{noreply, S};
do_whereis(Name,From)->
caseis_global_lock_set()of
false->
gen_server:reply(From, where(Name));//注意这里
true->
send_again({whereis,Name,From})
end.
好吧,有锁,还有sleep:
send_again(Msg)->
Me=self(),
spawn(fun()->timer(Me,Msg)end).
timer(Pid,Msg)->
random_sleep(5),
Pid!Msg.
到了这里已经搞明白问题了,还在用R15以上版本的同学们赶快升级吧!
接下来顺便理理global lock的逻辑:
is_global_lock_set()->
is_lock_set(?GLOBAL_RID).
is_lock_set(ResourceId)->
ets:member(global_locks,ResourceId).
看到了一个关键的表global_locks,看看register_name的过程:
-spec register_name(term(), pid()) -> 'yes' | 'no'.
register_name(Name, Pid) when is_pid(Pid) ->
register_name(Name, Pid, fun random_exit_name/3).
-type method() :: fun((term(), pid(), pid()) -> pid() | 'none').
-spec register_name(term(), pid(), method()) -> 'yes' | 'no'.
register_name(Name, Pid, Method) when is_pid(Pid) ->
Fun = fun(Nodes) ->
case (where(Name) =:= undefined) andalso check_dupname(Name, Pid) of
true ->
gen_server:multi_call(Nodes,
global_name_server,
{register, Name, Pid, Method}),
yes;
_ ->
no
end
end,
?trace({register_name, self(), Name, Pid, Method}),
gen_server:call(global_name_server, {registrar, Fun}, infinity).
handle_call({registrar, Fun}, From, S) ->
S#state.the_registrar ! {trans_all_known, Fun, From},
{noreply, S};
S#state.the_registrar是什么?看注释:
%% The registrar is a helper process that registers and unregisters
%% names. Since it never dies it assures that names are registered and
%% unregistered on all known nodes. It is started by and linked to
%% global_name_server.
start_the_registrar() ->
spawn_link(fun() -> loop_the_registrar() end).
loop_the_registrar() ->
receive
{t
补充:Web开发 , 其他 ,