IRMACS Centre's Infrastructure - Computational Cluster Support

The IRMACS computational cluster is a ten blade server with each blade consisting of 2 Quad Core Intel Xeon 2.33 Ghz CPUs with 16 GB of memory for a total of 80 CPUs and 160 GB of memory. User accounts reside on a 2 TB Raid 0+1 disk array. The cluster is running 64-bit Red Hat Enterprise Linux 5.

Using the Cluster

Currently, the file system on the cluster is not tied into the IRMACS file system, so users will have two home directories and will need to transfer files between them. To access the cluster: login into head.irmacs.sfu.ca using your IRMACS account credentials. Computational jobs should not be run on the head nodes (head.irmacs.sfu.ca), they should be submitted to the cluster nodes using the batch queuing system described below. Users should not directly log in to the cluster nodes themselves, but instead use the PBS scheduling commands to submit, monitor, and control their jobs (qsub, qstat, qdel). These commands are described in more detail below.

Software

The cluster supports C/C++, Fortran, Python, and Java compiliers. The following software are also installed (in /usr/local on the compute nodes):

Maple 11 Matlab R2008a MRBayes 3.1.2 R 2.6.1 SAS 9.1.2 COMSOL 3.4

Queues

There are two job queues on the IRMACS cluster:

  • Batch - for jobs that will take more than a few hours to complete. This queue has access to 64 CPUs.
  • Live - for jobs lasting a few hours or less. This queue has access to 16 CPUs.

Submitting Jobs

To use the multiprocessor queueing system, a user must create a PBS script, and then submit the script to the cluster. To submit a job, a user uses the following command:

qsub "script_file"

To monitor a job a user uses the following command:

qstat -nl

To monitor all the queued and running jobs:

qstat -a

To delete a job in the queue:

qdel jobid (where jobid is the ID of the job given to you when the job is submiited with qsub)

PBS Script Elements

Note: Not all #PBS directives are needed for each script type.

#!/bin/bash "Script to use"
#PBS -N job_name "Name of the submitted job"
#PBS -q queue_name "Name of the queue to run job on"
#PBS -M user_email "Email address to send notifications to"
#PBS -m bae "Send email (b)efore execution (a)fter execution (e)rror occurs"
#PBS -l nodes=2 "How many CPUs need for job"
#PBS -j oe "Join stdout and stderr"
#PBS -o path_to_job_log "Send job log to file"
#PBS -e path_to_error_log "Send error log to file"

Single Job Example

#! /bin/bash
#PBS -N Myprogram
#PBS -q batch
#PBS -M user @ domain.com
#PBS -m bae
./myprogram param1 param2

Matlab Job Example

#! /bin/bash
# Note - this assumes that Matlab is in your path. Otherwise, it will be
# necessary to put the explicit path in the PBS script below.
#PBS -N Myprogram
#PBS -q batch
#PBS -M user @ domain.com
#PBS -m bae
/usr/local/bin/matlab -nosplash -nodesktop < my_matlab_program.m > outputfile.txt

Threaded Parallel Job Example

#! /bin/bash
#PBS -N Myprogram
#PBS -l nodes=1:ppn=6
#PBS -q batch
#PBS -M user @ domain.com
#PBS -m bae
./myprogram -np 6 < inputfile.txt > outputfile.txt

Note: In the above example, this assumes that myprogram is a multi-threaded program and takes advantage of the number of processors given to it by the command line arguement -np (in this case 6 processors). Because this is a multi-threaded application, it MUST run on a single node, since the inter-processor comminucation is through thread locks and shared memory. Thus the -l nodes=1 part of the PBS directive is essential. The ppn=6 part of the PBS directive tells the scheduler that the job requires 6 processors. If this is omitted then it will by default only assign it one processor. This would result in five of the six threads possibly running on processors with other jobs running on them, slowing down both your job and that of other cluster users.

MPI Parallel Job Example

#! /bin/bash
#PBS -N Myprogram
#PBS -l nodes=1:ppn=8
#PBS -q batch
#PBS -M user @ domain.com
#PBS -m bae
mpirun -np 8 -machinefile $MPI_NODES ~/mypgrogram

Note: In the above example, -np is the number of nodes for MPI to use and it should match the number of procesors requested by the #PBS directive. The PBS directive -l nodes=1:ppn=8 means that we have requested a single node with 8 CPUs on that node. This is the optimal configuration for MPI jobs as the communication between processors will occur through shared memory rather than across the relatively slow network (the IRMACS cluster consists of 10 nodes, each with eight processros). Also $MPI_NODES is a predefined system environment variable.

Technical Support

The IRMACS Centre employs a professional technical team to support researchers' use of the cluster.

If you have any other questions about the computational cluster, contact the IRMACS Centre.