HTCondor can be installed on top of a Bright Cluster as follows:
Note: The following instructions have been tested on Bright 7.3 with CentOS 7 as the base OS.
On the head node
1. Install dependencies:
# yum install setools-console policycoreutils-python perl-Date-Manip.noarch
2. untar the sources:
# tar -xzvf condor-8.4.9-x86_64_RedHat7-stripped.tar.gz
3. Add condor user:
# cmsh
% user add condor
% comit
4. Install Condor using condor_install script (please note that installation should be done using any other user than root using the –owner option):
# cd condor-8.4.9-x86_64_RedHat7-stripped/
# ./condor_install --prefix=/cm/shared/apps/condor/8.4.9 --owner=condor --install-dir=/cm/shared/apps/condor/8.4.9
Installing Condor from /root/condor-8.4.9-x86_64_RedHat7-stripped to /cm/shared/apps/condor/8.4.9
Condor has been installed into:
/cm/shared/apps/condor/8.4.9
Configured condor using these configuration files:
global: /cm/shared/apps/condor/8.4.9/etc/condor_config
local: /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/condor_config.local
In order for Condor to work properly you must set your CONDOR_CONFIG
environment variable to point to your Condor configuration file:
/cm/shared/apps/condor/8.4.9/etc/condor_config
before running Condor
commands/daemons
.
Created scripts which can be sourced by users to setup their
Condor environment variables. These are:
sh: /cm/shared/apps/condor/8.4.9/condor.sh
csh: /cm/shared/apps/condor/8.4.9/condor.csh
5. Copy the condor environment variables setup scritps under /etc/profile.d :
# cp --preserve /cm/shared/apps/condor/8.4.9/condor.{sh,csh} /etc/profile.d/
6. Configure Condor with condor_configure script:
# ./condor_configure --type=manager,submit --verbose --install-dir=/cm/shared/apps/condor/8.4.9
Condor will be run as user: condor
Install directory: /cm/shared/apps/condor/8.4.9
Main config file: /cm/shared/apps/condor/8.4.9/etc/condor_config
Local directory: /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2
Local config file: /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/condor_config.local
Writing settings to file:/cm/shared/apps/condor/8.4.9/etc/condor_config
CONDOR_HOST=ma-c-12-30-b73-c7u2.cm.cluster
COLLECTOR_NAME=
DAEMON_LIST=COLLECTOR MASTER NEGOTIATOR SCHEDD
Configured condor using these configuration files:
global: /cm/shared/apps/condor/8.4.9/etc/condor_config
local: /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/condor_config.local
In order for Condor to work properly you must set your CONDOR_CONFIG
environment variable to point to your Condor configuration file:
/cm/shared/apps/condor/8.4.9/etc/condor_config
before running Condor
commands/daemons.
Created scripts which can be sourced by users to setup their
Condor environment variables. These are:
sh: /cm/shared/apps/condor/8.4.9/condor.sh
csh: /cm/shared/apps/condor/8.4.9/condor.csh
7. modify the condor_config file to point to the correct paths for different configuration parameters and expand the Condor pool beyond a single host (set ALLOW_WRITE to match all of the hosts):
# cat /cm/shared/apps/condor/8.4.9/etc/condor_config | grep -vE "^#|^$"
RELEASE_DIR = /cm/shared/apps/condor/8.4.9
LOCAL_DIR = /cm/shared/apps/condor/8.4.9/local.$(HOSTNAME)
LOCAL_CONFIG_FILE = /cm/shared/apps/condor/8.4.9/local.$(HOSTNAME)/condor_config.local
LOCAL_CONFIG_DIR = $(LOCAL_DIR)/config
use SECURITY : HOST_BASED
ALLOW_WRITE = *.cm.cluster
use ROLE : Personal
CONDOR_HOST = master.cm.cluster
UID_DOMAIN = cm.cluster
FILESYSTEM_DOMAIN = cm.cluster
LOCK = /tmp/condor-lock.0.0129490057743205
CONDOR_IDS = 1001.1001
CONDOR_ADMIN = root@master.cm.cluster
MAIL = /usr/bin/mail
JAVA = /usr/bin/java
JAVA_MAXHEAP_ARGUMENT = -Xmx1024m
DAEMON_LIST = MASTER COLLECTOR SCHEDD NEGOTIATOR
STARTD_DEBUG = D_FULLDEBUG
COLLECTOR_DEBUG = D_FULLDEBUG
COLLECTOR_HOST = $(CONDOR_HOST):9618
8. Create startup/boot script for starting Condor services
# cp --preserve /cm/shared/apps/condor/8.4.9/etc/examples/condor.service /lib/systemd/system/
# cat /lib/systemd/system/condor.service
[Unit]
Description=Condor Distributed High-Throughput-Computing
After=syslog.target network-online.target
nslcd.service ypbind.service
Wants=network-online.target
[Service]
Environment=CONDOR_CONFIG=/cm/shared/apps/condor/8.4.9/etc/condor_config
ExecStart=/cm/shared/apps/condor/8.4.9/sbin/condor_master -f
ExecStop=/cm/shared/apps/condor/8.4.9/sbin/condor_off -master
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=1minute
StandardOutput=syslog
LimitNOFILE=16384
[Install]
WantedBy=multi-user.target
# systemctl enable condor.service
Created symlink from /etc/systemd/system/multi-user.target.wants/condor.service to /usr/lib/systemd/system/condor.service.
9. Start the Condor service:
# systemctl restart condor.service
# # systemctl status condor.service
● condor.service - Condor Distributed High-Throughput-Computing
Loaded: loaded (/usr/lib/systemd/system/condor.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2016-12-30 11:24:23 CET; 4s ago
Main PID: 15093 (condor_master)
CGroup: /system.slice/condor.service
├─15093 /cm/shared/apps/condor/8.4.9/sbin/condor_master -f
├─15118 condor_procd -A /tmp/condor-lock.0.0129490057743205/procd_pipe -L /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/log/ProcLog -R 1000000 -S 60 -C 1001
├─15119 condor_collector -f
├─15132 condor_negotiator -f
└─15133 condor_schedd -f
Dec 30 11:24:23 ma-c-12-30-b73-c7u2 systemd[1]: Started Condor Distributed High-Throughput-Computing.
Dec 30 11:24:23 ma-c-12-30-b73-c7u2 systemd[1]: Starting Condor Distributed High-Throughput-Computing...
# ps aux | grep condor
condor 15093 0.0 0.1 42884 5516 ? Ss 11:24 0:00 /cm/shared/apps/condor/8.4.9/sbin/condor_master -f
root 15118 0.0 0.1 23004 4580 ? S 11:24 0:00 condor_procd -A /tmp/condor-lock.0.0129490057743205/procd_pipe -L /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/log/ProcLog -R 1000000 -S 60 -C 1001
condor 15119 0.0 0.1 64064 6352 ? Ss 11:24 0:00 condor_collector -f
condor 15132 0.0 0.1 42884 5480 ? Ss 11:24 0:00 condor_negotiator -f
condor 15133 0.0 0.1 63268 7144 ? Ss 11:24 0:00 condor_schedd -f
root 15211 0.0 0.0 112648 956 pts/0 S+ 11:25 0:00 grep --color=auto condor
In the software image — assuming default-image is the image currently used by the compute nodes
1. Install dependencies:
# yum install setools-console policycoreutils-python perl-Date-Manip.noarch --installroot=/cm/images/default-image
2. Install Condor inside the software image
Create a local configuration directory for each compute node (substitute node001/node002 with the correct node name and repeat/loop for the required number of nodes):
# cp -r --preserve /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/ /cm/shared/apps/condor/8.4.9/local.node001/
# cp -r --preserve /cm/shared/apps/condor/8.4.9/local.ma-c-12-30-b73-c7u2/ /cm/shared/apps/condor/8.4.9/local.node002/
Create a startup/boot script for starting Condor services in the software image:
# cat /cm/images/default-image/lib/systemd/system/condor.service
[Unit]
Description=Condor Distributed High-Throughput-Computing
After=syslog.target network-online.target nslcd.service ypbind.service network.target
Wants=network-online.target network.target
[Service]
Environment=CONDOR_CONFIG=/cm/shared/apps/condor/8.4.9/local.%H/condor_config.local
ExecStart=/cm/shared/apps/condor/8.4.9/sbin/condor_master -f
ExecStop=/cm/shared/apps/condor/8.4.9/sbin/condor_off -master
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=1minute
StandardOutput=syslog
LimitNOFILE=16384
[Install]
WantedBy=multi-user.target
Copy the condor_config file to condor_config.local under each local.<node> directory after making the necessary changes for the DAEMON_LIST
# cat /cm/shared/apps/condor/8.4.9/local.node001/condor_config.local | grep -vE "^#|^$"
RELEASE_DIR = /cm/shared/apps/condor/8.4.9
LOCAL_DIR = /cm/shared/apps/condor/8.4.9/local.$(HOSTNAME)
LOCAL_CONFIG_FILE = /cm/shared/apps/condor/8.4.9/local.$(HOSTNAME)/condor_config.local
LOCAL_CONFIG_DIR = $(LOCAL_DIR)/config
use SECURITY : HOST_BASED
use ROLE : Personal
CONDOR_HOST = master
ALLOW_WRITE = *
UID_DOMAIN = cm.cluster
FILESYSTEM_DOMAIN = cm.cluster
LOCK = /tmp/condor-lock.0.0129490057743205
CONDOR_IDS = 1001.1001
CONDOR_ADMIN =root@master.cm.cluster
MAIL = /usr/bin/mail
JAVA = /usr/bin/java
JAVA_MAXHEAP_ARGUMENT = -Xmx1024m
COLLECTOR_HOST = $(CONDOR_HOST):9618
DAEMON_LIST = MASTER STARTD
3. reboot the compute nodes to be provisioned using the modified software image
(check status from the head node after the nodes are up)
# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
node001.cm.cluster LINUX X86_64 Unclaimed Idle 0.000 993 0+00:30:04
node002.cm.cluster LINUX X86_64 Unclaimed Idle 0.150 993 0+00:00:04
Total Owner Claimed Unclaimed Matched Preempting Backfill
X86_64/LINUX 2 0 0 2 0 0 0
Total 2 0 0 2 0 0 0
[root@ma-c-12-30-b73-c7u2 ~]#
Submitting a job
Submitting jobs as root is not allowed so you have to switch to any other user to be able to submit jobs.
# su - cmsupport
$ cat hostname.sh
#!/bin/bash
hostname -f
date
sleep 20
date
echo "exit"
$ cat hostname.condor
############
# Example job file
############
Universe=vanilla
Executable=/home/cmsupport/hostname.sh
input=/dev/null
output=hostname.out
error=hostname.error
Queue
$ condor_submit hostname.condor
$ condor_q
-- Schedd: ma-c-12-30-b73-c7u2.cm.cluster : <10.141.255.254:50275?...
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
9.0 cmsupport 12/30 17:33 0+00:00:06 R 0 0.0 hostname.sh
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
$ cat hostname.out
node002.cm.cluster
Fri Dec 30 17:33:58 CET 2016
Fri Dec 30 17:34:18 CET 2016
exit
$ condor_history
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD
9.0 cmsupport 12/30 17:33 0+00:00:20 C 12/30 17:34 /home/cmsupport/hostname.sh
6.0 cmsupport 12/30 17:22 0+00:00:00 X ??? /home/cmsupport/hostname.sh
5.0 cmsupport 12/30 16:58 0+00:00:00 X ??? /home/cmsupport/hostname.sh
8.0 cmsupport 12/30 17:33 0+00:00:00 X ??? /home/cmsupport/hostname.sh
7.0 cmsupport 12/30 17:32 0+00:00:00 X ??? /home/cmsupport/hostname.sh
1.0 cmsupport 12/30 16:38 0+00:00:00 X ??? /home/cmsupport/hostname.sh
2.0 cmsupport 12/30 16:40 0+00:00:00 X ??? /home/cmsupport/hostname.sh
4.0 condor 12/30 16:43 0+00:00:00 X ??? /home/condor/hostname.sh