Torque 6 Update

As many of you well know, Torque 6.0.1 has officially been released, as well as its release notes. We remain excited about Torque 6, and initial feedback from our customers has been very good. We are confident that the 6.0.1 release greatly solidifies the newer features released with 6.0.0, especially the cgroup support.

torque

6.0.1 resolved several bugs and added some important features to help out existing customers. We added a lot of error-handling and logging information to harden the basics of our cgroup implementation; we also added more regression tests around corner cases and other issues that were reported from users of 6.0.0 and 6.0.1. More specifically:

  • We have added / enhanced support for enabling cgroups and using the -L resource request syntax. When a user requests using this syntax, instead of creating one set of cgroups per task, we create one set of cgroups per node. Memory is set based on the old memory parameters, with a server setting that controls whether vmem is a per node or per job request. We are hopeful that this support will help sites experience how powerful and useful cgroups can be for them, and encourage more experimentation with the new (-L) resource request syntax.
  • Pbs_server can now interpret the new syntax for placing jobs. Obviously, when you are running with a scheduler this doesn’t come into play much, but for testing purposes it should help a lot to not have to specify a hostlist each time you want to run a job.
  • Access to accelerators (GPUs and MICs) is now enforced through cgroups. In 6.0.0, we guaranteed that you’d be using resources close to the accelerators that you were assigned, and with 6.0.1 we have finished this to bind each process to its accelerators through cgroups as well.
  • More unit and regression tests have been added to cover the new functionality. Unfortunately, physical placing has a lot of corner cases and we are limited in the amount of hardware and the number of these cases that we naturally thought of to cover. Additional users have helped us discover even more cases, and we have since been able to add these. In some cases, unit tests were used to cover things that we didn’t have hardware to address. In other cases, we’ve tried to force certain conditions through carefully placing jobs, and in others we have found different permutations of the same requests which force us to hit the corner cases found at customer sites.

The TL;DR summary of all of this is that with 6.0.1 we have hardened every piece of the cgroup implementation: unit tests, regression tests, easier integration, ease of user testing, and we have fixed several bugs that we found in-house and that were reported by the user community. We are very pleased with the state of our cgroup implementation and confident that it can help you get more out of your resources.

Facebook Twitter Email

Comments

  1. “Really nice post…
    I am a beginner in blogging so it really help me out.
    Thanks for sharing such a useful tactics”

Speak Your Mind

*