The HEX Computer Farm contains over 50 worker nodes with over 100 central processing units. Jobs are submitted to these machines using the Condor batch system.
In order to make good use of the power of the HEX farm, it must be possible to break down your overall computing task into a large number of sub-tasks, each of which can be executed independently of the others. Ideally, these sub-tasks should each require at least 15 minutes of CPU time to complete. If the sub-tasks require less time than that to complete, the system may spend more time in the overhead operations of setting up and scheduling the tasks for execution than in the execution itself. On the other hand, the sub-tasks should also ideally not require more than a few hours of CPU time to complete, or otherwise it may not be possible for the scheduling algorithms to provide a fair access share to all users.
To use the farm, you must prepare two files:
As an example, suppose we wish to test each of the numbers from 0 to (n-1) to see whether it is prime. Here is a bash script which will (not very efficiently) determine whether its argument is a prime number:
#!/bin/bash # Determine if argument is a prime number num=$1 maxd=$((num/2)) div=2 while [ $((div<=maxd)) = 1 ] ; do dvd=$((num/div)) rmd=$((num-dvd*div)) if [ $rmd = 0 ] ; then echo $num "is not prime. Divisor is "$div exit fi div=$((div+1)) done echo $num "is prime" exit
Create a sub-directory /home/username/condortest and copy this script to the file "prime.bash" in that directory. This is the executable file. Make certain that "prime.bash" has the "execute" file permission set!
Here is the job control file:
universe = vanilla initialdir = /home/username/condortest error = /home/username/condortest/prime$(Process).error log = /home/username/condortest/prime$(Process).log output = /home/username/condortest/prime$(Process).out executable = prime.bash arguments = $(Process) queue 30
Copy this file to "prime.jcl" in the "condortest" sub-directory.
Here is an explanation of the lines in "prime.jcl":
Now setup Condor by entering
You must do this once after you login before any of the Condor commands will work. If you wish, this command can be placed in your .login or .(c)shrc file.source /condor/current/setup/condor-setup.(c)sh
Next, submit the job by entering
condor_submit prime.jcl
To view the Condor job queue, enter
To view only your own jobs, entercondor_q
condor_q username
Initially, all of your jobs will have status "I". Once the Condor system schedules them for execution this will change to "R". Even if there are available nodes to run your jobs, it may take 5 to 10 minutes for the Condor system to complete the scheduling.
Notice that "condor_q" will return a Condor job number of the form "NNNNN.MM", where "NNNNN" will be the same for all of the jobs just submitted, and "MM" will range from 0 to 29 (for this test example).
To remove jobs, enter
to remove all the jobs just submitted, orcondor_rm NNNNN
to remove a single job from the group of 30 which were just submitted.condor_rm NNNNN.MM
Once all 30 sub-tasks have finished executing, you will have 30 output files in /home/username/condortest named "prime0.out" through "prime29.out".