Resource Request 2.0 For Moab and TORQUE

In the upcoming release of TORQUE 6.0 and Moab 9.0, we are introducing Resource Requests 2.0, a new way of submitting jobs. We will continue to support all of the -l options for submitting jobs, but we are expanding to offer a task-centric syntax. This entry is a primer on the new resource requests, but obviously official documentation will have the final, more complete word.

56427906_thumbnail

-L tasks=#[:lprocs=#|all][:memory=#units][:swap=#units][:place={node*|socket|numanode|core|thread][=#]][:{usecores:usethreads:allowthreads}][:gpus=#][:mics=#][:gres=xxx[=#]]

Some Quick Notes

  • lprocs defaults to logical processors – in other words, not physical – but will be placed on physical cores if usecores is specified.
  • Multiple values cannot be submitted for anything except gres.
  • Multiple tasks are specified with multiple -L arguments at submission time.
  • Each task must fit on one host.
    • place=node cannot have a number after it, all of the other place values can.

Finer Control Over Resources

Many jobs want to lay out different resources for different pieces of the job, and this syntax will address that need for Moab and TORQUE. For example, if a job wants a bunch of parallel tasks to perform a calculation and another task to visualize it, this is easily submitted in two tasks:

qsub job.sh -L tasks=1:lprocs=6:memory=12gb -L tasks=100:lprocs=4

Obviously, there are as many different kinds of scenarios as users can imagine, but this syntax should be flexible enough to allow users to cover the vast majority of desired configurations.

Enforcing Memory Restrictions

Some may wonder why we included the option to specify swap in the request: this makes it so that jobs can be prevented from interfering with each other and get killed if the job attempts to use more memory than is requested. When cgroup enforcement is enabled, jobs requesting memory in this syntax will have a cgroup limit set on the amount of memory they are allowed to use. Once that is hit, the OS will force the job to use swap for all memory that it needs beyond the amount requested. If there is no swap request, no limit is placed; however, when the limit is in place the job will get killed when it attempts to use more memory than the total of memory and swap requested for the job. Essentially, the amount of swap is the tolerance for the job. Memory and swap can be set as defaults, minimums, and maximums so that admins can control how much tolerance should be allowed for jobs and ensure that jobs are adequately sharing resources.

Placing Jobs

Workloads execute faster or slower depending on how they are placed; many users’ jobs will have the lowest run-time if they are running on physical cores. This is easily accomplished by adding usecores to the submission. Other users may know that their tasks use a lot of memory and want them to be placed exclusively on a numa node, and yet others might know that they don’t want any other jobs interfering on the socket where they are using an accelerator, and we offer this level of control:

qsub core_job.sh -L tasks=10:lprocs=2:usecores

qsub numa_job.sh -L tasks=2:lprocs=4:place=numanode

qsub accel_job.sh -L tasks=7:lprocs=5:gpus=1:place=socket

Each of these will be placed so that no other jobs can share the requested resources: if the job is going to use cores, then no other job can use the remaining threads; if the job is going to use a numa node, then no other job can grab the idle resources from that numa node; the same holds true for sockets and nodes.

We are really pleased with the new syntax and believe it will open up a lot of new possibilities and allow more efficient use of resources across the board. We believe that this will ultimately yield a much greater return on investment for all of our users.

Facebook Twitter Email

Speak Your Mind

*