The HEX Computer Farm is a collection of compute nodes with the effective computing power of over 600 central processing units. Jobs are submitted to these machines using the HTCondor batch system.
In order to make good use of the power of the HEX farm, it must be possible to break down your overall computing task into sub-tasks, each of which can be executed independently of the others.
To use the farm, you must prepare two files:
As an example, suppose we wish to test each of the numbers from 0 to (n-1) to see whether it is prime. Here is a bash script which will (not very efficiently) determine whether its argument is a prime number:
#!/bin/bash # Determine if argument is a prime number num=$1 maxd=$((num/2)) div=2 while [ $((div<=maxd)) = 1 ] ; do dvd=$((num/div)) rmd=$((num-dvd*div)) if [ $rmd = 0 ] ; then echo $num "is not prime. Divisor is "$div exit fi div=$((div+1)) done echo $num "is prime" exit
To use this script with the HEX farm, first create a sub-directory /home/username/condortest and copy this script to the file "prime.bash" in that directory. This is the executable file. Make certain that "prime.bash" has the "execute" file permission set!
Here is the job control file:
universe = vanilla initialdir = /home/username/condortest error = /home/username/condortest/prime$(Process).error log = /home/username/condortest/prime$(Process).log output = /home/username/condortest/prime$(Process).out executable = prime.bash arguments = $(Process) Notification=never queue 30
Copy this file to "prime.jcl" in the "condortest" sub-directory.
Here is an explanation of the lines in "prime.jcl":
Now setup HTCondor by entering
You must do this once after you login before any of the HTCondor commands will work. If you wish, this command can be placed in your .login or .(c)shrc file.source /condor/HTCondor/current/condor.(c)sh
Next, submit the job by entering
condor_submit prime.jcl
To view the HTCondor job queue, enter
To view only your own jobs, entercondor_q -all
condor_q username
Initially, all of your jobs will have status "I". Once the HTCondor system schedules them for execution this will change to "R". Even if there are available nodes to run your jobs, it may take a few minutes for the HTCondor system to complete the scheduling.
Notice that "condor_q" will return a HTCondor job number of the form "NNNNN.MM", where "NNNNN" will be the same for all of the jobs just submitted, and "MM" will range from 0 to 29 (for this test example).
To remove jobs, enter
to remove all the jobs just submitted, orcondor_rm NNNNN
to remove a single job from the group of 30 which were just submitted.condor_rm NNNNN.MM
Once all 30 sub-tasks have finished executing, you will have 30 output files in /home/username/condortest named "prime0.out" through "prime29.out".