Green Power Management Deep Dive – Green Computing Report

During Adaptive Computing’s user conference, MoabCon 2013, which took place in Park City, Utah, in April, HPC Product Manager Gary Brown delivered an in-depth presentation on Moab’s Auto Power Management capabilities. Moab is the policy-based intelligent scheduler inside Adaptive’s family of workload management products.

The latest Moab HPC Suite Enterprise Edition (version 7.2.2) includes a variety of green policy settings to help cluster owners increase efficiency and reduce energy use and costs by 10-30 percent.

While this presentation is targeted at Moab users, there are principles here that will apply to general cluster management. The talk is laid out in three parts: power management, green computing (aka auto power management) and future power management.

The core of Moab power management is the tracking of two node power states: on and off. “On” means node power has been reported as on by the power resource manager, but “off” does not necessarily mean the power is off – to Moab it just means that this node cannot be used, although there is the option to completely remove the power.

To run a job Moab requires that the node power state be on and the resource manager (e.g., TORQUE, SLURM, etc.) be running and have reported node information.

Power management relies on system jobs, which are distinct from user jobs. A system job won’t appear in a queue, but it does execute on the Moab head node or server and it is usually script-based and uses asynchronous operation.

Brown remarks that when there is a large cluster with a large number of nodes, the system jobs may have to handle large quantities of node hosting. They may therefore have to be self-throttling to manage initial demand spike, and require a daemon process to maintain power state for reporting purposes.

There are two system jobs that are used for power management. The first is defined by the ClusterQueryURL parameter, used by Moab to query the power state of all nodes in the cluster. Moab runs it at the start of the scheduling cycle. This synchronous operation reads every node’s power state and passes it to Moab. It blocks Moab from scheduling until all node power states are reported or the operation times out.

The other power system job is defined by NodePowerURL, which is used by Moab to power on and off compute nodes. The mutually-exclusive parameters are on and off. In green computing schemes, this is normally run at the end of the Moab scheduling cycle, but it can be run on an on-demand basis, says Brown. This script needs to interface with the power management system in order to turn nodes on and off. Each system vendor usually has its own power management software, ex. HP has iLO, Dell is DRAC, etc.

Adaptive recently created new power management reference scripts in response to customer demand. These are all Python-based and work with OpenIPMI. They have all been regression tested on Adaptive QA system hardware – and are active, working scripts.

Brown explains that if there’s an administrator-initiated script, Moab immediately runs the script and powers on or off the node. When the script is green policy-initiated, Moab runs it as part of the scheduling cycle.

There’s a potential issue with this setup in the case of large clusters. If there are a lot of nodes, it can take a long time to query the power state, during which Moab’s scheduling activities are blocked. The solution is a power state “monitor” daemon that gathers node power states at regular intervals and records them in a file. Then the ClusterQuery script can read the power states out of the file, providing Moab with the information it needs without excessive slowdown.

The first thing the power ClusterQuery script checks for is that “power monitor” daemon. If it doesn’t find it, it starts it up. The daemon gathers the power state every “poll” interval, which can be set by the admin independent of the scheduling interval. While polling nodes, it stores the power state data in a new temporary file. After polling all nodes, it replaces the file read by the Power ClusterQuery script with this new file. When Moab goes to read it at the start of a scheduling cycle, it basically reads these power states and reports them to Moab very quickly.

Brown goes on to the topic of configuring power management inside Moab. He says that admins must define a power resource manager. Brown emphasizes that this is not TORQUE, which is a workload management tool.

The first step is to define the power resource manager, then the node power action time limits, as in this example:

Furthermore, there’s an option in the Moab configuration file to “record node modification events,” which creates a notification when Moab powers on or off compute nodes (see above).

The next step in configuring power management is to edit the file and set the variables inside the file. The reference scripts make the assumption that every IMPI interface uses the same user name and password, which is normally the case, says Brown.

The next action item is to customize the power management scripts. There will be two of them. The user needs to go into the script that performs the node power control and replace the node power commands with the ones that are appropriate for their system. At this point, there is the option to replace “off” with another choice, for example “sleep,” “hibernate” and “orderly shutdown.” Then next step is to edit the monitor daemon script, to replace commands in the script with the user’s system-specific power commands. If the NodePowerURL management script “off” commands were changed, then the monitor daemon settings should be likewise adjusted.

Click here to read the full report

Facebook Twitter Email