Changes to VPP queues
The changes foreshadowed in the April 97 edition of superFAQs have now been implemented. The aims of the changes are

The most obvious and important changes are:


Excess jobs

The concept of excess jobs has been introduced to allow a large number of jobs (or long jobs) to be queued but avoid any one user "hogging" the system.

Users who are not queuing large amounts of work (lots of jobs or long jobs) are unaffected.

Depending on the total time request of a user's jobs, some may be considered "excess jobs" (those below the "excess jobs" line in nqstat) and will be automatically queued at a lower queue priority than other normal jobs. This simply means that users submitting their first job are queued above the "excess jobs". Excess jobs are still part of normal, can run if there is an idle PE and run at the same priority as any other normal job when they do run - they just temporarily sit at the bottom of the queue.

The excess job attribute is not static - excess jobs will be automatically promoted to non-excess (promoted to "normal" priority) when your total queued cputime decreases sufficiently. The unused cputime of a running job is considered queued time (but is, of course, continually decreasing).


normal queueing limits

The current queing limits for the normal queue are as follows. Suggestions for changes are welcome in the light of experience.

number of
jobs queued
limit on the total cputime in hours
normal excluding excess normal including excess
1 24 hrs 24 hrs
2 24 hrs 48 hrs
3 20 hrs 44 hrs
4 16 hrs 40 hrs
5 - 36 hrs
6 - 32 hrs
7 - 28 hrs
8 - 24 hrs

Notes:

  1. any jobs in excess of 4 in normal are considered "excess jobs".
  2. the same limits apply to the bonus queue.

Example

Consider submitting four 8 hour jobs to the normal queue when you have no jobs already in the queue. Since the first two jobs have total time less than 24hrs they will not be excess. But the third will take your total summed normal cputime over 20hrs so it and the next job will be considered excess. All four jobs may run immediately if PEs are available. Otherwise when the first two jobs are running and have used 4hrs of cputime (so that there are 12 unused cpu-hrs between them), the "unused cputime" of your top three jobs totals less than 20hrs. The third job will then be automatically promoted out of the "excess job pool".

After jobs are submitted (no idle PEs):
normal ==== enabld ==================== 24:00:00 === Pri=25 == 1920MB == 8 ====
19770  ttt900  z04        stdin   QUED  08:00:00                  80MB
19775  ttt900  z04        stdin   QUED  08:00:00                  80MB
             --- excess jobs ---
19780  ttt900  z04        stdin   QUED  08:00:00                  80MB
19785  ttt900  z04        stdin   QUED  08:00:00                  80MB


After top two jobs have run for a total of 4 cpu-hrs:
normal ==== enabld ==================== 24:00:00 === Pri=25 == 1920MB == 8 ====
19770  ttt900  z04        stdin    RUN  02:01:01 02:06:07    72:  80MB 100%   4
19775  ttt900  z04        stdin    RUN  01:59:00 02:01:46    72:  80MB 100%   5
19790  abc123  z99        stdin   QUED  04:00:00                  40MB
19780  ttt900  z04        stdin   QUED  08:00:00                  80MB
             --- excess jobs ---
19785  ttt900  z04        stdin   QUED  08:00:00                  80MB


ANU Supercomputer Facility - Home Page | Contact us
The Australian National University