TORQUE Resource Manager 4.2.0 Early Access Release Notes

November 2012

The release notes file contains the following sections:

Overview

TORQUE 4.2.0 provides Intel Xeon Phi (MIC architecture) card support, introduces the ability to run a single job in two domains in a Cray system, supports starting and stopping services on a SLES system, and enhances preexisting features.

The New Features section provides more information about what TORQUE 4.2.0 has to offer.

New Features

Intel Xeon Phi (MIC architecture) card supported as new accelerator option

TORQUE can auto-detect the presence of MIC architecture cards when configured to do so. It can report metrics from them and allocate them to workloads. This feature requires the use of the Moab Workload Manager scheduler. See Scheduling accelerator hardware in the TORQUE Administrator Guide for more information.

Ability to run a single job in two domains added

TORQUE now supports multiple heterogeneous (multi-req) resource requests within a job for Cray systems. A job can request compute nodes both inside the Cray and outside of it. TORQUE manages the job on the Cray and non-Cray compute nodes.

max_user_queuable is now global

The server parameter max_user_queuable is now a system-wide parameter. Any configured value applies to all queues collectively. For example, if you set max_user_queuable to 5 previously, TORQUE would allow users to submit up to 5 jobs to each queue. If you set it to 5 now, users would be allowed to submit up to 5 jobs total across all queues.

SLES 11 (SP1/SP2) service management

You can now stop and start TORQUE services on a SLES system.

Known Issues

TORQUE 4.2.0 Early Access still occasionally experiences deadlock conditions. In most cases, this happens when users make extensive use of routing queues, job arrays, and/or job dependencies. Please report instances of deadlock to Technical Support if you encounter such.

System Requirements

The following software is required to run TORQUE 4.2.0:

Installation Information

The directions to install and configure TORQUE are in chapter 1 of the TORQUE 4.2.0 Administrator Guide. Also note additional instructions in the PBS Administrators Guide and README.building_40.

Note that you may need to install libssl-dev in order for the source to make successfully. Specifically, the build system is looking for libssl.so and libcrypto.so. For non-RPM setups, you may need to make a symbolic link from the ssl and crypto libraries to the respective .so names.

Upgrading to TORQUE 4.2.0 and Backward Compatibility

TORQUE 4.2.0 is not backward compatible with versions of TORQUE prior to 4.0. When you upgrade to TORQUE 4.2.1, all MOM and server daemons must be upgraded at the same time.

The job format is compatible between 4.2.0 and previous versions of TORQUE. Any queued jobs will upgrade to the new version with the exception of job arrays in TORQUE 2.4 and earlier. It is not recommended to upgrade TORQUE while jobs are in a running state.

Because TORQUE 4.2.0 has removed all use of UDP/IP and moved all communication to use TCP/IP, previous versions of TORQUE will not be able to communicate with the components of TORQUE 4.2.0. However, all files in the /var/spool/torque ($TORQUE_HOME) directory and all subdirectories are forwardly compatible.

Documentation

The online help for TORQUE 4.2.0 is available in HTML and PDF format.

Changelog

Legend
c - crash
b - bug fix
e - enhancement
f - new feature
n - note


© Copyright 2012, Adaptive Computing Enterprises, Inc.