Submitting jobs to the HEX Computer Farm

Introduction

The HEX Computer Farm is a collection of compute nodes with the effective computing power of around 1400 central processing units. Jobs are submitted to these machines using the HTCondor batch system.

In order to make good use of the power of the HEX farm, it must be possible to break down your overall computing task into sub-tasks, each of which can be executed independently of the others.

Prerequisites

Setup HTCondor by entering

source /condor/HTCondor/alma8/condor.sh
You must do this once after you login before any of the HTCondor commands will work. If you wish, follow the instructions on this page to have this set up automatically upon each login.

Computing resources available on the batch nodes

To view available resources, you can use the following command that is specific to our instance of HTCondor. It is similar to condor_status -compact but provides some additional formatting.

ru_condor_status
This command also lists how many CPUs are currently unused, how many are in use by local user jobs, and how many are currently running jobs submitted by remote users (e.g. CMS CRAB jobs submitted at other sites that run using our resources). When our queues fill up, additional local user submissions cause this last category of jobs to be preempted, so that we always have priority for using our resources.

Note that typical user jobs have a single CPU thread allocated to them, and an average memory of 4 GB per job. If you require more resources than that per job, you can specify RequestMemory = X or RequestCpus = Y in your .jdl file. Our worker nodes have up to 64 CPU threads on a single machine, so Y must be smaller than that, and the larger the resource request is, the fewer jobs that can run at once. It is also important to note that such multithreaded jobs are only beneficial in specific use cases, and one would typically benefit from running many more single-threaded jobs rather than a smaller number of multithreaded ones.

To view all running jobs and the resources they are using, you can use the following command that is specific to our instance of HTCondor (and is similar to condor_q -run -all).

ru_condor_q_run

To view the entire condor queue, you can use condor_q -all or ru_condor_q (which additionally breaks things down into CPU vs. GPU jobs). You can optionally add the -nobatch flag to see the command that each job will run.

An example of job submission

To use the farm, you must prepare two files:

  1. An executable file, which the HTCondor system will schedule for execution on the worker nodes. This can be either a script, or an executable binary created by a compiler/linker.
  2. A job description file, which specifies various details of the job to the HTCondor system

As an example, suppose we wish to test each of the numbers from 0 to (n-1) to see whether it is prime. Here is a bash script which will (not very efficiently) determine whether its argument is a prime number:

#!/bin/bash
# Determine if argument is a prime number
num=$1
maxd=$((num/2))
div=2
while [ $((div<=maxd)) = 1 ] ; do
dvd=$((num/div))
rmd=$((num-dvd*div))
if [ $rmd = 0 ] ; then
echo $num "is not prime.  Divisor is "$div
exit
fi
div=$((div+1))
done
echo $num "is prime"
exit

To use this script with the HEX farm, first create a directory /home/username/condortest and copy this script to the file "prime.bash" in that directory. This is the executable file. Be sure that "prime.bash" has the "execute" file permission set! Use chmod u+x prime.bash to do so.

Here is the job description file:

universe = vanilla
error = /home/username/condortest/prime$(Process).error
log = /home/username/condortest/prime$(Process).log
output = /home/username/condortest/prime$(Process).out
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
executable = prime.bash
arguments = $(Process)
queue 30

Copy this into a file named "prime.jdl" in the "condortest" sub-directory.

Here is an explanation of the lines in "prime.jdl":

universe = vanilla
HTCondor has several job submission paradigms. We use only "vanilla".
error, log, and output
Files for error, log and output information. Note the "$(Process)" construction used in the specification of these files. This will be replaced by HTCondor during submission by the actual number of the sub-task. This means that a separate file will be created for each sub-task.
should_transfer_files and when_to_transfer_output
These will make the job run within a scratch space area (with a size of at least 14 GB per job on average) on the condor node as its initial directory, and output/error files will be copied to your local working area at the end of the job.
executable
The name of the file to execute.
arguments
Arguments to pass to the program. In this case, just the sub-task number is passed as an argument, which will be the number tested for primeness by the prime.bash script.
queue 30
30 is the number of sub-tasks to be submitted for this test. $(Process) will range from 0 to 29.

Next, submit the job by entering

condor_submit prime.jdl

To view the HTCondor job queue, enter

condor_q -all
To view only your own jobs, enter
condor_q username

Initially, all of your jobs will have status "I". Once the HTCondor system schedules them for execution this will change to "R". Even if there are available slots to run your jobs, it may take a few minutes for the HTCondor system to complete the scheduling.

Notice that "condor_q -all" will return a HTCondor job number of the form "NNNNN.MM", where "NNNNN" will be the same for all of the jobs just submitted, and "MM" will range from 0 to 29 (for this test example).

To watch the log files of your jobs as they are being populated on the node, you can make use of

condor_tail NNNNN.MM, condor_tail -stderr NNNNN.MM, and condor_ssh_to_job NNNNN.MM

To remove jobs, enter

condor_rm NNNNN
to remove all the jobs just submitted, or
condor_rm NNNNN.MM
to remove a single job from the group of 30 which were just submitted.

Once all 30 sub-tasks have finished executing, you will have 30 output files in /home/username/condortest named "prime0.out" through "prime29.out".