Systems at Scale
Today’s solutions that support the HPC market are simply inadequate to support systems at large scale. Modern supercomputers now have so many internal network interconnects and coordinate so many calculations at such a rate that traditional scheduling cannot keep up. Jobs sit idle when they should be running; policy constraints are lost in the noise of hardware failure at scale; data remains opaque and unanalyzed instead of generating insight.
The large-scale system challenge is intensifying. A generation ago, most HPC problems consumed modest amounts of data in a single stage; many of today’s projects require intense data processing and complex workload and resource orchestration that is impossible to manage manually. This workload and resource orchestration phenomenon multiplies problems at scale, raises the stakes for performance, and demands groundbreaking sophistication. Schedulers used to place with only CPU and RAM as major considerations; now workload and resource orchestration demands smarter policy and more savvy choices based on data locality and interdependent deadlines.
In 2014, Adaptive Computing announced its Ascent Team created to focus on scalability of Moab for Exascale and beyond. The Ascent team has dramatically increased TORQUE and Moab’s throughput and scalability across the board. This higher throughput and scale is made possible by:
- Tighter Cooperation between Moab and TORQUE – Harmonizes these key structures to reduce overhead and improve communication between the HPC scheduler and the resource manager(s)
- More Parallelization – Increases the decoupling of Moab and TORQUE’s network communication so that Moab is less dependent on TORQUE’S responsiveness. This delivers 2x speed improvements and shortens the duration of the average scheduling iteration in half
- Accounting Improvements – Allows toggling between multiple modes of accounting with varying levels of enforcement to provide greater flexibility beyond strict allocation options. In addition, new up-to-date accounting balances provide real-time insights into usage tracking
Adaptive Computing manages the largest systems in the world and has done so for more than a decade. As systems get larger Adaptive Computing will continue to raise the bar and support systems at scale to deliver the ease of use driven productivity.