ANUSF
Fujitsu VPP300 Userguide

 

NQS Batch queues

Interactive user

Project Accounting

File Systems

Programming Languages

Programming Tools

Vectorization and Tuning Guide

HPC Programming Hints

 

 

Older doco that may still be relevant

VP2200 User Guide
 &  Vectorization

 

[Back to]
[VPP]

 

   

Using NQS on the VPP300

Although small, relatively short jobs may be run interactively on PE0, most production jobs should run on the secondary PEs (PE1 to PE12) under NQS. This page provides an introduction to using NQS and specific local information about differences between the NQS setup at ANU and that described in the standard system documentation.

Consult the man pages indicated below or the UXP/V Network Queuing System Handbook amongst the UXP/V Basic Software Manuals under the on-line Fujtisu manuals for further details.

Using the basic commands
NQS queue limits

Self-submitting jobs

CPU ratio

queue modes

Basic commands

The interface to NQS is not the standard one - locally developed "wrappers" have been placed around most standard NQS commands for accounting purposes. The main commands needed to use the ANU VPP300 are:
  • nqsub for submitting jobs
  • nqdel for deleting jobs
  • nqstat for finding the state of the queues
Look at the man pages for nqsub, qsub, qdel, nqstat, qstat and anu_limits for complete details.

nqsub
The simplest use of the nqsub command is typified by the following example:

% nqsub -P a99 -q normal -lM 50MB
Enter your commands followed by ^D
./a.out
^D

The options are:
  • the project (default value is that of the environment variable PROJECT) specified by -P
  • the queue specified by -q and
  • the job memory limit is specified by -lM

More commonly, you include the commands you wish your job to execute in a shell script and give the name of that shell script as the final argument to NQS (the shell script cannot be passed command line arguments). nqsub accepts a vast array of arguments - look at the nqsub and qsub man pages (nqsub is a wrapper for qsub).

Options of note:

-lM (job memory limit)
Since there is no swapping on the S-PEs, strict memory allocation is necessary for NQS jobs. This means that you must use the -lM option in all nqsub commands. The specified memory also allows the system to determine which of the PEs should run your job. It is in your interests to specify a sensible memory limit - your jobs will run sooner. A little trial and error may be required to find how much memory you are using - nqstat lists both your allocation and actual usage.

-ls and -lV (not as per qsub)
Instead of using the "maximum allowable on the queue" as the defaults for the stacksize (-ls) and the temporary file space (-lV), nqsub uses 4MB and 0MB, respectively. This is to prevent wasting main memory, which must be set aside to cater for the amount of stack and temporary file space (mrfs) requested. If your program requires more than these defaults, you must use the -ls and -lV options in your flags.

-lV and -cc (mrfs options)
These two options are used to create and access an mrfs, a memory resident temporary filesystem. The -cc sets the current working directory of your job to the mrfs directory; this option is usually not necessary.

-nr options
It is recommended that, unless your job can cleanly restart from an interruption (system crash etc), you should use the -nr to make your job non-restartable. Without this option, if your job was executing when the system crashed then it will be automatically restarted when the system is restarted. This may cause files to be overwritten.
Look at the nqsub and qsub man pages for complete details of all options.

nqstat
The current state of the VPP queues is displayed by the nqstat command - for full details, see the nqstat man page. It is probably worth trying out various combinations of the -c and -d options to get an output appearance you like.

The queue header gives the limit on CPU time, memory and CPU ratio for you and your project. The fields in the job lines are fairly straightforward. Note:

  • the first field, the request-id which is used for qdel'ing your job and
  • the memory fields in the form "usage:request" - using this information you can make a sensible memory request for your next job.

Example nqstat output:

vpp00:~> nqstat -c = normal
USER  jhj900   PROJECT  z00            CPU TIME ELAPTIME     MEMORY   RCPU  PE
normal ==== enabld ==================== 24:00:00 === Pri=25 = 1920MB === 8 ====
41814  spg224  w05          dhf    RUN  23:46:32 24:37:06   616: 904MB 100%  11
41837  hmb565  g89      m30.nqs    RUN  17:21:59 22:00:10    80: 104MB 100%   2
41838  rpb104  k58     job1.run    RUN  02:23:50 08:43:47    24:  24MB  11%   4
41965  shc651  r06       normal   QUED  01:00:00                 280MB
     

qdel
qdel is the standard NQS supplied command for removing jobs from the queue. If the job is not yet running, it can be removed with
qdel jobid
If the job is already running, it is necessary to specify the -k option:
qdel -k jobid

NQS queue limits

NQS queue limits are subject to change, hence should be got from the system via the anu_limits command. We trust the limits and charging rates for each queue will indicate the purpose of each queue. e.g. Higher priority queues have a higher charging weight.


Self-submitting jobs

nqsub can only be run on PE0 but your batch jobs will be running on PE1 to PE12. Hence self-submitting jobs will have rsh to vpp00 to run nqsub, i.e. the self submission line looks something like:

rsh vpp00 "cd $QSUB_WORKDIR; \
             nqsub -P a99 -q $QSUB_QUEUENAME -lM $MEMLIM $QSUB_REQNAME"

CPU ratio

Each queue has associated with it a cpuratio given in the RCPU column of nqstat output. This number as a fraction of the sum of the cpuratios of all requests running on the same processing element (PE) indicates what proportion of cputime is available to that request. This fraction as a percentage is also given for running jobs under nqstat and may change during execution depending on what jobs it shares the PE with - the request cpuratio is fixed.


Queue modes

As well as the usual attributes of NQS queues, queues on the VPP300 have a mode attribute which can be either:

  • share: the jobs may share the PEs they use with other jobs or
  • simplex: jobs have dedicated use of the number of PEs they require.
The ANU VPP300 is configured with all serial queues and the parexpress queue in share mode by default. The parsmall and parlarge queues are simplex mode (dedicated) access. These queues can be started on request.