The Ascent Initiative: Increasing Throughput and Scalability

The ascent initiative is designed to make TORQUE and Moab increase in throughput and scalability across the board. Our goal is that we’ll increase both by 2-3 times across the board in the coming release; we expect to meet this goal, and we expect that for certain use cases we will be able to speed things up and scale upwards much more than 2-3 times.

Image via http://tpwd.state.tx.us/

Image via http://tpwd.state.tx.us/

We’re adding more multi-threaded functionality to Moab. Currently this is being implemented in part of the scheduling iteration: evaluating different nodes at the same time on multiple threads to faster determine how to run a job. In tests done so far this executes 3-4 times faster than the latest release of 7.2 (a hotfix of 7.2.6). Additionally, this has the benefit of scaling as you dedicate more cores to Moab process, whereas current versions are unaffected by having more available cores. More work is being done in Moab such as exploring addition caching and compiler optimizations that won’t affect debugging symbols. This work is being pursued primarily by Golden Murray.

I’m currently working on optimizations specific to TORQUE. Pbs_server has always supported running jobs with or without a hostlist, and in order to do this it iterates over all of the nodes to fulfill the request. Since Moab supplies a host list to TORQUE when it runs the job—as do most schedulers that interface with TORQUE—iterating through all of the nodes is unnecessary. Acsent’s Pbs_server is smart enough to recognize when a request is a host list and then directly access those nodes instead of iterating through each node to find them, only iterating over all nodes when it is a job that has no hostlist.

For just under 3000 nodes, pbs_server is 40% faster at running single node jobs than the older model. This test was performed on a system that wasn’t doing anything other than running the jobs—it would be a larger speed-up on a busy system because you’d have to wait to acquire the mutex for each node that you didn’t need, whereas now you’ll only need to wait for the nodes you actually are going to use. Obviously, running larger jobs will experience less of a speed-up, and having more than 3000 nodes in the system will give you more of a speed-up.

Numerous other improvements to TORQUE have been made and are still being quantified. For example: pbs_server currently polls the mother superior of every job for job usage updates every 45 seconds. This is now eliminated and the updates are passed along with every mom status message sent to the server. Exec_host lists are currently one entry per execution slot, but have been changed in ascent to be one entry per node with a range string specifying which execution slots are used on each node. This makes qstat -f a lot more human readable, but it also will speed up Moab, pbs_server, and the moms because all will spend less time on string manipulation for jobs that use many processors. Both of these provide significant speed advantages to pbs_server, but have yet to be quantified.

Much of the work we’re doing with these is determining how to effectively measure how much additional speed and scaling we’ll see as a result of the changes. We expect to concisely quantify these improvements. At MoabCon 2014 we expect to provide very detailed explanations of these and additional projects completed by then. Golden and I will actually present a session together on the Ascent project and its status to that point. We hope to see you there.

Facebook Twitter Email