Solutions to Enterprise High Performance Computing Challenges
While all HPC systems face challenges in workload demand, workflow constraints, resource complexity, and system scale, enterprise HPC systems face more stringent challenges and expectations. Enterprise HPC systems must meet mission-critical and priority HPC workload demands for commercial businesses, business-oriented research, and academic organizations. The Enterprise has requirements for dynamic scheduling, provisioning and management of multi-step/multi-application services across HPC, cloud, and big data environments. Their workloads directly impact revenue, product delivery, and organizational objectives.
Enterprise HPC systems must process intensive simulation and data analysis more rapidly, accurately and cost-effectively to accelerate insights. By improving workflow and eliminating job delays and failures, the business can more efficiently leverage data to make data-driven decisions and achieve a competitive advantage. The Enterprise is also seeking to improve resource utilization and manage efficiency across multiple heterogeneous systems. User productivity must increase to speed the time to discovery, making it easier to access and use HPC resources and expand to other resources as workloads demand.
Enterprise-Ready HPC Workload Management
Moab® HPC Suite – Enterprise Edition accelerates insights by unifying data center resources, optimizing the analysis process, and guaranteeing services to the business. Moab 8.1 for HPC systems and HPC cloud continually meets enterprise priorities through increased productivity, automated workload uptime, and consistent SLAs. It uses the battle-tested and patented Moab intelligence engine to automatically balance the complex, mission-critical workload priorities of enterprise HPC systems. Enterprise customers benefit from a single integrated product that brings together key Enterprise HPC capabilities, implementation services, and 24/7 support.
It is imperative to automate workload workflows to speed time to discovery. The following new use cases play a key role in improving overall system performance:
- Elastic Computing – unifies data center resources by assisting the business management resource expansion through bursting to private/public clouds and other data center resources utilizing OpenStack as a common platform
- Performance and Scale – optimizes the analysis process by dramatically increasing TORQUE and Moab throughput and scalability across the board
- Tighter cooperation between Moab and Torque – harmonizing these key structures to reduce overhead and improve communication between the HPC scheduler and the resource manager(s)
- More Parallelization – increasing the decoupling of Moab and TORQUE’s network communication so Moab is less dependent on TORQUE’S responsiveness, resulting in a 2x speed improvement and shortening the duration of the average scheduling iteration in half
- Accounting Improvements – allowing toggling between multiple modes of accounting with varying levels of enforcement, providing greater flexibility beyond strict allocation options. Additionally, new up-to-date accounting balances provide real-time insights into usage tracking.
- Viewpoint Admin Portal – guarantees services to the business by allowing admins to simplify administrative reporting, workload status tracking, and job resource viewing
Moab® HPC Suite – Enterprise Edition accelerates insights by increasing overall system, user and administrator productivity, achieving more accurate results that are delivered faster and at a lower cost from HPC resources. Moab provides the scalability, 90-99 percent utilization, and simple job submission required to maximize productivity. This ultimately speeds the time to discovery, aiding the enterprise in achieving a competitive advantage due to its HPC system. Enterprise use cases and capabilities include the following:
Enterprise use cases and capabilities include the following:
- OpenStack integration to offer virtual and physical resource provisioning for IaaS and PaaS
- Performance boost to achieve 3x improvement in overall optimization performance
- Advanced workflow data staging to enable improved cluster utilization, multiple transfer methods, and new transfer types
- Advanced power management with clock frequency control and additional power state options, reducing energy costs by 15-30 percent
- Workload-optimized allocation policies and provisioning to get more results out of existing heterogeneous resources and reduce costs, including topology-based allocation
- Unify workload management across heterogeneous clusters by managing them as one cluster, maximizing resource availability and administration efficiency
- Optimized, intelligent scheduling packs workloads and backfills around priority jobs and reservations while balancing SLAs to efficiently use all available resources
- Optimized scheduling and management of accelerators, both Intel Xeon Phi and GPGPUs, for jobs to maximize their utilization and effectiveness
- Simplified job submission and management with advanced job arrays and templates
- Showback or chargeback for pay-for-use so actual resource usage is tracked with flexible chargeback rates and reporting by user, department, cost center, or cluster
- Multi-cluster grid capabilities manage and share workload across multiple remote clusters to meet growing workload demand or surges
Job and resource failures in enterprise HPC systems lead to delayed results, missed organizational opportunities, and missed objectives. Moab HPC Suite – Enterprise Edition intelligently automates workload and resource uptime in the HPC system to ensure that workloads complete successfully and reliably, avoid failures, and guarantee services are delivered to the business.
The Enterprise benefits from these features:
- Intelligent resource placement prevents job failures with granular resource modeling, meeting workload requirements and avoiding at-risk resources
- Auto-response to failures and events with configurable actions to pre-failure conditions, amber alerts, or other metrics and monitors
- Workload-aware future maintenance scheduling that helps maintain a stable HPC system without disrupting workload productivity
- Real-world expertise for fast time-to-value and system uptime with included implementation, training and 24/7 support remote services
Moab® HPC Suite – Enterprise Edition uses the powerful Moab intelligence engine to optimally schedule and dynamically adjust workload to consistently meet service level agreements (SLAs), guarantees, or business priorities. This automatically ensures that the right workloads are completed at the optimal times, taking into account the complex number of using departments, priorities, and SLAs to be balanced. Moab provides the following benefits:
- Usage accounting and budget enforcement that schedules resources and reports on usage in line with resource sharing agreements and precise budgets (includes usage limits, usage reports, auto budget management, and dynamic fair share policies)
- SLA and priority policies that make sure the highest priority workloads are processed first (includes Quality of Service and hierarchical priority weighting)
- Continuous plus future scheduling that ensures priorities and guarantees are proactively met as conditions and workload changes (i.e. future reservations, pre-emption)
The benefits of a traditional HPC environment can be extended to more efficiently manage and meet workload demand with the multi-cluster grid and HPC cloud management capabilities in Moab HPC Suite – Enterprise Edition. It allows you to:
- Self-service job submission portal enables users to easily submit and track HPC cloud jobs at any time from any location with little training required
- Pay-for-use showback or chargeback soactual resource usage is tracked with flexible chargeback rates reporting by user, department, cost center, or cluster
- On-demand workload-optimized node provisioning dynamically provisions nodes based on policies, to better meet workload needs.
- Manage and share workload across multiple remote clusters to meet growing workload demand or surgesby adding on the Moab HPC Suite – Grid Option.
Managing the World’s Top Systems, Ready to Manage Yours
Moab manages the largest, most scale-intensive and complex high performance computing environments in the world. This includes many of the Top500, leading commercial, and innovative research HPC systems. Adaptive Computing is the largest supplier of HPC workload management software. Adaptive’s Moab HPC Suite and TORQUE rank #1 and #2 as the most used scheduling/workload management software at HPC sites and Adaptive is the #1 vendor used across unique HPC sites. [1,2]  According to the IDC HPC End-user Study of System Software and Middleware in Technical Computing, 2013 According to the 2012 HPC Site Census Survey: Middleware, December 2012, by Intersect360 Research