Categories

ID #1314

How do I configure PBSPro HA with DAS shared storage?

How do I configure PBSPro High Availability with DAS shared storage?

 

Bright Cluster Manager does not support HA (High Availability) for PBSPro when the shared storage used is DAS (Please note that DRBD is no longer supported).

To achieve HA in Bright Cluster Manager 8.x, PBSPro must be configured manually as described in this article.

 

Note: As part of this procedure, a copy of the pbs_comm executable is made. The administrator should keep in mind that if Bright Cluster Manager is updated, then the copied executable won’t be, which could cause incompatibilities.

 

For this example, the hostnames of the head nodes and their IP addresses on the internalnet are:

 

head1 10.141.255.254

head2 10.141.255.253

 

This procedure assumes that HA and shared storage have already been configured.

 

1) Configure PBSPro

 

On the primary head node, run:

 

wlm-setup -s -w pbspro

 

2) Set the pbspro service to run only in the active head node

 

With cmsh run this command:

 

% device

% foreach -l master (services; use pbspro; set runif active; commit)

 

3) Freeze PBSPro configuration file in both head nodes

 

On both head nodes edit the file /cm/local/apps/cmd/etc/cmd.conf and add this line:

 

FrozenFile = { "/etc/pbs.conf" }

 

then restart CMDaemon in both head nodes:

 

service cmd restart

 

4) Modify head node configurations on both nodes

 

On the primary Head node edit /etc/pbs.conf and modify these properties:

 

PBS_START_COMM=0

PBS_START_SCHED=1

PBS_SERVER=master

PBS_SERVER_HOST_NAME=head1.cm.cluster

#PBS_PRIMARY=head1.cm.cluster

#PBS_SECONDARY=head2.cm.cluster



On the secondary Head node edit /etc/pbs.conf and modify these properties:

 

PBS_START_COMM=0

PBS_START_SCHED=1

PBS_SERVER=master

PBS_SERVER_HOST_NAME=head2.cm.cluster

#PBS_PRIMARY=head1.cm.cluster

#PBS_SECONDARY=head2.cm.cluster

 

From within cmsh, restart the pbs service on the primary head node.

 

% device; use head1; services; use pbspro; restart



5) Create pbs_comm service configuration

 

On the primary head node, create the file /etc/pbs_comm.conf with this content:

 

PBS_EXEC=/cm/local/apps/pbspro/current

PBS_HOME=/cm/local/apps/pbspro/var/spool

PBS_START_SERVER=0

PBS_START_COMM=1

PBS_START_SCHED=0

PBS_START_MOM=0

PBS_SERVER=master

PBS_SERVER_HOST_NAME=head1.cm.cluster

 

On the secondary head node, create the file /etc/pbs_comm.conf with this content:

PBS_EXEC=/cm/local/apps/pbspro/current

PBS_HOME=/cm/local/apps/pbspro/var/spool

PBS_START_SERVER=0

PBS_START_COMM=1

PBS_START_SCHED=0

PBS_START_MOM=0

PBS_SERVER=master

PBS_SERVER_HOST_NAME=head2.cm.cluster

 

6) Create the pbs_comm service

 

On both head nodes, create the directories /cm/local/apps/pbspro/current/sbin and /cm/local/apps/pbspro/current/etc

 

    mkdir -p /cm/local/apps/pbspro/current/{sbin,etc}

 

On both head nodes, copy the file /etc/init.d/pbs to the file /etc/init.d/pbs_comm, with the following modifications:

 

: main code

export PBS_CONF_FILE=/etc/pbs_comm.conf

 

conf=${PBS_CONF_FILE:-/etc/pbs_comm.conf}

 

 

 

[...]

case "$1" in

start_msg) echo "Starting PBS_COMM" ;;

stop_msg) echo "Stopping PBS_COMM" ;;

status) status_pbs ;;

start) pre_start_pbs ;;

stop) stop_pbs ;;

restart) echo "Restarting PBS_COMM" ; stop_pbs ; pre_start_pbs ;;

*) echo "Usage: ´basename $0´ --version" ;

echo "Usage: ´basename $0´ {start|stop|restart|status}" ; exit 1 ;;

esac

 

 

On both head nodes copy the file /cm/shared/apps/pbspro/current/etc/pbs_habitat to /cm/local/apps/pbspro/current/etc/pbs_habitat, with the following modifications:

 

# Start of the pbs_habitat script

#

export PBS_CONF_FILE=/etc/pbs_comm.conf

conf=${PBS_CONF_FILE:-/etc/pbs_comm.conf}

 

On both head nodes copy the pbs_comm executable to /cm/local:

 

cp /cm/shared/apps/pbspro/current/sbin/pbs_comm /cm/local/apps/pbspro/current/sbin/pbs_comm

 

7) Terminate the running pbs_comm processes

 

On both head nodes, terminate any pbs_comm running processes.

 

killall -KILL pbs_comm

 

8) Configure monitoring of the pbs_comm service in Bright Cluster Manager.

 

With cmsh configure the service for both head nodes:

 

% device

% foreach -l master (services; add pbs_comm; set monitored on; set autostart on; commit)



9) Freeze PBSPro configuration in the software images

 

For each software image which will be used for the compute nodes, edit the file /cm/local/apps/cmd/etc/cmd.conf and add this line to the file:

 

FrozenFile = {"/etc/pbs.conf","/cm/local/apps/pbspro/var/spool/mom_priv/config"}



10) Configure the clients to point to a single server and to have two leaf routers.

 

For each software image that will be used for the compute nodes, edit the file /etc/pbs.conf and modify these properties:

 

PBS_SERVER=master

PBS_LEAF_ROUTERS=head1.cm.cluster,head2.cm.cluster

#PBS_PRIMARY=head1.cm.cluster

#PBS_SECONDARY=head2.cm.cluster



11) Modify node configurations

 

For each software image that will be used for the compute nodes, edit the file /cm/local/apps/pbspro/var/spool/mom_priv/config so it has the following content:

 

$clienthost head1

$clienthost head2

$restrict_user_maxsysid 499

 

12) Reboot the compute nodes



Categories for this entry

Tags: -

Related entries:

You cannot comment on this entry