This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.
HTCondor can be installed on top of a Bright Cluster as follows:
On the head node
1. Install Condor RPM package and its dependencies:
# yum install setools-console-3.3.7-4.el6.x86_64 policycoreutils-python-2.0.83-19.39.el6.x86_64 perl-Date-Manip.noarch
# rpm -ivh condor-8.2.1-256063.rhel6.5.x86_64.rpm
2. configure the head node to be the manager and submit host:
# condor_configure --type=manager,submit --verbose
3. Copy Condor environment variables scripts
# cp /usr/condor.* /etc/profile.d/
4. Modify the configuration script to expand the Condor pool beyond a single host:
(set ALLOW_WRITE to match all of the hosts)
# cat /etc/condor/condor_config | grep ALLOW_WRITE
ALLOW_WRITE = *
5. Start the Condor service:
# service condor start
In the software image
This is assuming default-image is the image currently used by the compute nodes.
1. Install Condor RPM package and its dependencies:
# yum --installroot=/cm/images/default-image/ install setools-console-3.3.7-4.el6.x86_64 policycoreutils-python-2.0.83-19.39.el6.x86_64 perl-Date-Manip.noarch libvirt-client-0.10.2-29.el6_5.2.x86_64
# rpm --root=/cm/images/default-image/ -ivh condor-8.2.1-256063.rhel6.5.x86_64.rpm
2. configure the nodes to be execute hosts:
# cat /cm/images/default-image/etc/condor/condor_config.local
[...]
CONDOR_HOST = master.cm.cluster
[...]
DAEMON_LIST = MASTER, STARTD
3. Modify the configuration script to expand the Condor pool beyond a single host:
# cat /cm/images/default-image/etc/condor/condor_config
[...]
ALLOW_WRITE = *
4. reboot the compute nodes to be provisioned using the modified software image
Check status from the head node after the nodes are up
# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
node001.cm.cluster LINUX X86_64 Unclaimed Idle 0.530 490 0+00:00:02
node002.cm.cluster LINUX X86_64 Unclaimed Idle 0.210 490 0+00:00:04
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 2 0 0 2 0 0 0
Total 2 0 0 2 0 0 0
Submitting a job
Like for most other workload managers, submitting jobs as root is not allowed by Condor. So switching to any other user allows jobs to be submitted.
# su - cmsupport
[cmsupport@adel70-c6 ~]$ cat hostname.sh
#!/bin/bash
hostname -f
sleep 20
date
echo "exit"
[cmsupport@adel70-c6 ~]$ cat hostname.condor
############
#
# Example job file
#
############
Universe = vanilla
Executable = hostname.sh
input = /dev/null
output = hostname.out
error = hostname.error
Queue
[cmsupport@adel70-c6 ~]$ condor_submit hostname.condor
[cmsupport@adel70-c6 ~]$ condor_q
-- Submitter: adel70-c6 : <10.150.8.241:40768> : adel70-c6
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
6.0 cmsupport 8/1 05:08 0+00:00:24 R 0 17.1 hostname.sh
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
[cmsupport@adel70-c6 ~]$ cat hostname.out
node002.cm.cluster
Fri Aug 1 05:09:22 PDT 2014
exit