Moab® HPC Suite – Grid Option is a powerful grid-workload management solution that provides unified scheduling, advanced and flexible policy management, integrated resource management, and consolidated monitoring and management across multiple clusters. Moab HPC Suite – Grid Option's patented intelligence engine accelerates, automates and unifies all of the complex workload decisions and automation actions needed to control and optimize the workload and resource components of advanced grids. Moab HPC Suite – Grid Option connects disparate clusters into a logical whole in a matter of minutes, with its decision engine enabling grid administrators and grid policies to have sovereignty over all systems while preserving control at the individual cluster.
Moab HPC Suite – Grid Option has a powerful range of capabilities that allow organizations to consolidate reporting, synchronize management across policies processes and resources, and optimize workload-sharing and data management across multiple clusters. Moab HPC Suite – Grid Option delivers these services in a near-transparent way so users are unaware they are using grid resourcesthey know only that they are getting work done faster and more easily than ever before.
Moab HPC Suite – Grid Option addresses key challenges of managing and optimizing multiple clusters and moving to a more efficient and productive grid environment including:
- Multi-cluster sprawl increases administrative burden, overhead costs and inefficiency as workload and policy management, monitoring, reporting and planning must be done separately
- Unbalanced resource utilization slows job throughput and wastes resources workload waits to run on overloaded clusters and underutilized resources sit idle on other clusters
- Can't share workloads due to different workload requirements, policies and SLAs across independent group- and organization-based clusters that make unified workload decisions complex and hard to enforce consistently across multiple clusters
- Need to manage across the multiple resource managers across multiple clusters to schedule and allocating resources more effectively; can't rip and replace for grid management due to process and script investments
- Need to integrate job submission for users across multiple credential and submission tools for multiple clusters to keep users productive using what they know while accelerating job processing and efficiency
Unified and Flexible Grid Management
Moab HPC Suite – Grid Option provides the automated complex decision control and management flexibility real-world grid environments require. Its multi-dimensional decision engine is able to accelerate workload output while balancing all the complexities of both grid and various local cluster and organizational workload priorities and policies. It can be flexibly configured to unify workload management policies into centralized management, provide centralized and local management policies, or integrate between local peer-to-peer cluster management policies. This ensures that service level guarantees and overall organizational objectives are achieved while utilization and valuable results output from the unified grid environment is dramatically increased.
Managing the World's Top Systems, Ready to Manage Yours
Moab manages the world's largest, most scale-intensive and complex HPC environments in the world including 40% of the top 10 supercomputing systems, nearly 40% of the top 25 and 36% of the compute cores in the top 100 systems based on rankings from the www.Top500.org. So you know it is battle-tested and ready to efficiently and intelligently manage the complexities of your environment.
Moab® HPC Suite – Grid Option creates an optimized grid environment with key benefits that accelerate workload productivity and reduce management complexity including:
-
Fast time to value for grid implementation with unified management across heterogeneous clusters that enables you to move quickly from cluster to optimized grid
-
Improved job response times and job throughput with a policy-driven and predictive decision engine that accelerates complex workload scheduling decisions to enable faster job start times and high throughput computing
-
Optimized throughput and utilization across grid and clusters of 90-99% with flexible multi-dimensional decision engine that optimizes workload processing at both grid and cluster level
-
Reduced management burden and costs with grid-wide interface and reporting tools that provide a unified view of grid resources, status and usage charts, and trends over time for capacity planning, diagnostics, and accounting
-
Scalable architecture to support local area to wide area grids, peta-scale, high throughput computing, and beyond
-
Automated enforcement of grid and local cluster level service guarantees, priorities, and resource sharing agreements across users, groups, and projects sharing grid resources
-
Advanced administrative control allows various business units to access and view grid resources, regardless of physical or organizational boundaries, or alternatively restricts access to resources by specific departments or entities
-
Increased productivity by simplifying use, access, and control of broader set of HPC resources for both users and administrators with integrated job submission, grid-aware job arrays, job templates, optional user portal, and GUI administrator management and monitoring tool
Moab® HPC Suite – Grid Option accelerates workload processing with a patented multi-dimensional decision engine that self-optimizes grid-wide and local workload placement, resource utilization and results output while ensuring complex organizational priorities are met across the users and groups leveraging the grid environment.
Key capabilities include:
 |
Policy-driven scheduling intelligently places workload on optimal set of diverse resources to maximize job throughput and success as well as utilization and the meeting of workload and group priorities
- Priority, SLA and resource sharing policies ensure the highest priority workloads are processed first and resources are shared fairly across users and groups such as quality of service, hierarchical priority weighting, and fairshare targets, limits and weights policies
- Grid-wide workload management policies that respect local cluster configuration and needs, and local policies and rules if desired, including granular settings to control where jobs can originate and be processed
- Allocation policies optimize resource utilization and prevent job failures with granular resource modeling and scheduling, affinity- and node topology-based placement
- Backfill job scheduling speeds job throughput and maximizes utilization by scheduling smaller or less demanding jobs as they can fit around priority jobs and reservations to use all available resources
- Security policies control which users and groups can access which resources with simple credential mapping and integration with popular security tools used across multiple clusters
- Checkpointing and Pre-emption
|
 |
Real-time and predictive scheduling ensure job priorities and guarantees are proactively met as conditions and workload levels change across the grid
- Allow local cluster-level optimizations of most grid workload
- Optimized data staging ensures that remote data transfers are synchronized with resource availability to minimize poor utilization, leveraging existing data-migration technologies such as Secure Copy (SCP) and GridFTP
- Advanced reservations guarantee the availability of key resources at specific times and that jobs run when required
- Maintenance reservations reserve resources for planned future maintenance to avoid disruption to business workloads
- Predictive scheduling enables the future grid workload schedule to be continually forecasted and adjusted along with resource allocations to adapt to changes in conditions and new job and reservation requests
- Enhance job performance with automatic learning that improves scheduling decisions based on historical workload results
|
 |
Advanced scheduling and management of GPGPUs for jobs to maximize their utilization
- Automatic detection and management of GPGPUs in the grid environment to eliminate manual configuration and make them immediately available for scheduling
- Exclusively allocate and schedule GPGPUs on a per-job basis
- Policy-based management & scheduling using GPGPU metrics
- Quick access to statistics on GPGPU utilization and key metrics for optimal management and issue diagnosis such as error counts, temperature, fan speed, and memory
|
 |
Easier submission, management, and control of job arrays improve user productivity and job throughput efficiency
- Users can easily submit thousands of sub-jobs with a single job submission with an array index differentiating each array sub-joba
- Job array usage limit policies enforce number of job maximums by credentials or class
- Simplified reporting and management of job arrays for end users filters jobs to summarize, track and manage at the master job level
- Speed job processing with enhanced grid placement options for job arrays; optimal or single cluster placement
|
 |
Scalable job performance to large-scale, extreme-scale, and high-throughput computing environments
- Efficiently manages the submission and scheduling of hundreds of thousands of queued job submissions to support high throughput computing
- Fast scheduler response to user commands while scheduling so users and administrators get the real-time job information they need
- Fast job throughput rate to get results started and delivered faster and keep utilization of resources up
- Scalable to manage up to 30 clusters when configured for centralized or centralized and local grid decisions and management
|
 |
Open and flexible decision engine structure easily integrates with and automates management across existing heterogeneous resources and middleware to improve management efficiency
- Rich data integration and aggregation enables you to set powerful, multi-dimensional policies based on the existing real-time resource data monitored without adding any new agents
- Unify management across existing internal, external, and partner clusters—even if they have different resource managers, databases, operating systems, and hardware
- Supports integration with job resource managers such as TORQUE and SLURM as well as integrating with many other types of resource managers such as HP Cluster Management Utility, Nagios, Ganglia, FlexLM, and others
|

 |
Ease of use and management improves productivity for both users and administrators
- Graphical administrator cluster management tool and portal provides unified view of all grid operations to make self-diagnosis, planning, reporting, and accounting across all resources, jobs, and clusters easier
- Establish trust between resource owners through graphical usage controls, reports, and accounting across all shared resources
- Tune policies prior to rollout with cluster- and grid-level simulations
- Collaborate more effectively with multi-cluster co-allocation that allow key resource reservations for high-priority projects
- Optional customizable end-user portal provides integrated job submission and management from any location, such as job forms, templates and start-time estimates, to reduce training and administrator requirements
- Integrate job submission across multiple existing submission tools to reduce new end-user training requirements
- Job templates enable rapid submission of common or multiple jobs by pre-specifying the variety of resources needed for each job to reduce duplicate work and simplifying job submissions for users
|
Moab® HPC Suite – Grid Option is architected to manage on top of the existing multiple job resource managers and other types of resource managers across the multiple clusters in your grid environment to provide the unified policy-based scheduling and management of workloads and resource allocation. It makes the complex workload decisions based on all of the data it integrates from the various resource managers and then orchestrates the job and management actions through those resource managers based on policies. This makes it the ideal choice to integrate with existing and new systems and clusters as well as to manage your grid as it grows and expands in the future.
Moab HPC Suite – Grid Option can be architected in three flexible grid management configurations; centralized, centralized and local, or peer-to-peer grid policies, decisions and rules. Its unique ability to manage multiple resource managers and multiple Moab instances makes this flexibility possible.


Moab HPC Suite – Grid Option is designed with a patented intelligence engine architecture that enables it to integrate with and automate workload management across existing heterogeneous environments and complex multiple organizational priorities to improve the management and workload efficiency of the environment. This unique architecture includes:
-
Industry leading multi-dimensional policies that automate the complex real-time decisions and actions for scheduling workload and adapting resources. These multi-dimensional policies can model and consider the workload requirements, resource attributes and affinities, SLAs and priorities to enable more complex and efficient decisions to be automated.
-
Real-time and predictive future environment scheduling & analytics that drive more accurate and efficient decisions and service guarantees as it can proactively adjust scheduling and resource allocations as it projects the impact of workload and resource condition changes.
-
Open & flexible management abstraction layer lets you integrate the data and orchestrate workload actions across the chaos of complex heterogeneous IT environments and management middleware to maximize workload control, automation, and optimization. (Diverse hardware and resource types, management domains and silos, operating systems, management tools, etc.)
System Compatibility:
Moab works with a variety of platforms. Many commonly used resource managers, operating systems, and architectures are supported.
-
Operating system support: for Linux (Debian, Fedora, FreeBSD, RedHat, SUSE), Unix (AIX, Solaris)
-
Resource Manager support: job resource managers such as TORQUE and SLURM as well as integrating with many other types of resource managers such as HP Cluster Management Utility, Nagios, Ganglia, FlexLM, and others
-
Hardware support: AMD x86, AMD Opteron, HP, Intel x86, Intel IA-32, Intel IA-64, IBM i-Series, IBM p-Series, IBM x-Series
Reporting Grid
Managers want to have global reporting across all HPC resources so they can see how users and projects are really utilizing hardware and so they can effectively plan capacity. Unfortunately, manually consolidating all of this information in an intelligible manner for more than just a couple clusters is a management nightmare. Moab® HPC Suite – Grid Option enables you to create a reporting grid, or share information across their clusters for reporting, diagnosis, and capacity-planning purposes.
Management Grids
Managing multiple clusters independently can be especially difficult when processes change, because policies must be manually reconfigured across all clusters. To ease that difficulty, Moab HPC Suite – Grid Option can create a management grid that imposes a synchronized management layer across all clusters, and their multiple resource managers, while still allowing each cluster some level of autonomy. Moab HPC Suite – Grid Option can be flexibly configured for centralized, centralized and local, or peer-to-peer grid policies, decisions and rules.
Workload-Sharing Grids
Sites with multiple clusters often have the problem of some clusters sitting idle while other clusters have large backlogs. Such inequality in cluster utilization wastes expensive resources, and training users to perform different workload-submission routines across various clusters can be difficult and expensive as well. To avoid these problems, Moab HPC Suite – Grid Option enables you to set up a workload-sharing grid that can be as simple as centralizing user submission or as complex as having each cluster maintain its own user submission routine while it migrates jobs based on policy.