当前位置:操作系统 > Unix/Linux >>

LAM/MPICLusterSystemWithFreeBSD5.3

[前言]

  MPI(Message Passing Interface)消息传送接口

  它不是一个协议,但它的地位已经实际上是一个协议了。它主要用于在分布式存储系统中的并行程序通信。MPI是一个函数库,它可以通过Fortran和C程序进行调用,MPI的好处是它速度比较快,而且移植性比较好。

  Cluster

  目前常见的Cluster(集群)架构有两种,一种是Web/Internet Cluster System,这种架构主要是将资料放置在不同的主机上面,亦即由多部主机同时负责一项服务;而另外一种则是所谓的平行运算了(Parallel Algorithms Cluster System)!平行运算其实就是将同一个运算的工作,交给整个Cluster里面的所有CPU来进行同步运算的一个功能。由于使用到多个CPU的运算能力,所以可以加快运算的速度。

  此文档所安装架设的LAM/MPI Cluster System属于后者,由于实验环境条件以及自身能力的限制,可能文档有部分解释不详尽,如有疑问请来信与我联系,我将尽力完善此文档,谢谢!

  [软件及平台]

  Server \ FreeBSD 5.3 Stable

  IP:172.18.5.247

  Hostname: center.the9.com

  Client \ FreeBSD 5.3 Release

  IP:172.18.5.80

  Hostname: node1.the9.com

  apache_1.3.29 \ All Ports Install

  php4-4.3.10

  php4-gd-4.3.10

  php4-extensions-1.0

  lam-6.5.9

  ganglia-monitor-core-2.5.6

  ganglia-webfrontend-2.5.5

  [目的]

  架设一套基于FreeBSD 5.3的LAM/MPI Cluster System.

  [安装及配置]

  一,各节点系统 /etc/hosts 的基本配置 \ 如果内网有DNS,则配置好系统中的 /etc/resolv.conf 即可!

  center.the9.com

  #more /etc/hosts

  172.18.5.247 center.the9.com

  172.18.5.80 node1.the9.com

  node1.the9.com

  #more /etc/hosts

  172.18.5.247 center.the9.com

  172.18.5.80 node1.the9.com

  二,Apache+PHP Server 的架设

  center.the9.com

  #cd /usr/ports/www/apache13-modssl

  #make install clean \ 安装 APACHE

  #cd /usr/ports/lang/php4-extensions

  #make install clean \ 安装 PHP. 切记这里一定要选择安装GD库

  #vi /usr/local/etc/apache/http.conf \ 加入以下相关参数

  AddType application/x-httpd-php .php

  AddType application/x-httpd-php-source .phps

  三,NFS Server-Client 的架设

  

  NFS Server(center.the9.com)

  #vi /etc/rc.conf \ 加入以下相关参数

  nfs_server_enable="YES"

  nfs_server_flags="-u -t -n 4 -h 172.18.5.247"

  mountd_enable="YES"

  mountd_flags="-r -l"

  rpcbind_enable="YES"

  rpcbind_flags="-l -h 172.18.5.247"

  #vi /etc/exports \ 配置NFS共享目录

  /cluster -maproot=0:0 -network 172.18.5.0 -mask 255.255.255.0

  #/etc/rc.d/rpcbind start

  #/etc/rc.d/mountd start

  #/etc/rc.d/nfsd start \ 启动NFS Server

  NFS Client(node1.the9.com)

  #vi /etc/rc.conf \ 加入以下相关参数

  nfs_client_enable="YES"

  #vi /etc/fstab \ 加入以下相关参数

  172.18.5.247:/cluster /cluster nfs rw 0 0

  #mount /cluster \ Mount /Cluster 目录

  四,LAM/MPI Cluster System的架设

  Step 1: 基本安装

  center.the9.com

  #cd /usr/ports/net/lam

  #make install clean \ 安装 LAM

  #cd /usr/ports/sysutils/ganglia-monitor-core

  #make install clean \ 安装Cluster System 所需的Monitor Core

  #cd /usr/ports/sysutils/ganglia-webfrontend

  #make install clean \ 安装上面Monitor Core 所需的WEB GUI

  node1.the9.com

  #cd /usr/ports/net/lam

  #make install clean \ 安装 LAM

  #cd /usr/ports/sysutils/ganglia-monitor-core

  #make install clean \ 安装Cluster System 所需的Monitor Core

  Step 2: 配置

  center.the9.com

  #cd /usr/local/etc/

  #cp gmond.conf.sample gmond.conf

  #cp gmetad.conf.sample gmetad.conf

  #vi gmond.conf \ 修改name和mcast_if 的参数

  # The name of the cluster this node is a part of

  # default: "unspecified"

  name "BSDCluster"

  # The multicast interface for gmond to send/receive data on

  # default: the kernel decides based on routing configuration

  mcast_if lnc0

  #vi gmetad.conf \ 修改data_source 的参数

  # data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655

  

  # data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651

  # data_source "another source" 1.3.4.7:8655 1.3.4.8

  data_source "BSDCluster" 10 center.the9.com:8649 node1.the9.com:8649

  #vi /usr/local/etc/lam-bhost.def \ 加入各node 的hostname

  center.the9.com

  node1.the9.com

  node1.the9.com \ 基本上,每个新增节点的配置都要和以上center.the9.com 的配置一致.

  node2.the9.com

  nodeX.the9.com ........

  五,Monitor WEB GUI 的配置

  center.the9.com

  #vi /usr/local/etc/apache/http.conf \ 加入以下相关参数,配置Cluster Monitor Web的路径

  Alias /ganglia/ "/usr/local/www/ganglia/"

  <Directory "/usr/local/www/ganglia">

  Options Indexes FollowSymlinks MultiViews

  AllowOverride None

  Order allow,deny

  Allow from all

  </Directory>

  #vi /etc/rc.conf \ 加入以下参数

  apache_enable="YES"

  apache_flags="-DSSL"

  apache_pidfile="/var/run/httpd.pid"

  #/usr/local/etc/rc.d/apache.sh start \ 启动APACHE

  六,启动并调试Cluster System以及检查测试

  center.the9.com node1.the9.com nodeX.the9.com etc....

  #/usr/local/etc/rc.d/gmetad.sh start

  #/usr/local/etc/rc.d/gmond.sh start \ 启动Cluster 各Node的Monitor Core

  center.the9.com

  $lamboot -dv \ 启动各节点的lam daemon

  LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University

  lamboot: boot schema file: /usr/local/etc/lam-bhost.def

  lamboot: opening hostfile /usr/local/etc/lam-bhost.def

  lamboot: found the following hosts:

  lamboot: n0 center.the9.com

  lamboot: n1 node1.the9.com

  lamboot: resolved hosts:

  lamboot: n0 center.the9.com --> 172.18.5.247

  lamboot: n1 node1.the9.com --> 172.18.5.80

  lamboot: found 2 host node(s)

  lamboot: origin node is 0 (center.the9.com)

  Executing hboot on n0 (center.the9.com - 1 CPU)...

  lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H 172.18.5.247 -P 53433 -n 0 -o 0 ""

  

  hboot: process schema = "/usr/local/etc/lam-conf.lam"

  hboot: found /usr/local/bin/lamd

  hboot: performing tkill

  hboot: tkill

  hboot: booting...

  hboot: fork /usr/local/bin/lamd

  [1] 28338 lamd -H 172.18.5.247 -P 53433 -n 0 -o 0 -d

  hboot: attempting to execute

  Executing hboot on n1 (node1.the9.com - 1 CPU)...

  lamboot: attempting to execute "/usr/bin/ssh node1.the9.com -n echo $SHELL"

  lamboot: got remote shell /bin/sh

  lamboot: attempting to execu
CopyRight © 2012 站长网 编程知识问答 www.zzzyk.com All Rights Reserved
部份技术文章来自网络,