Moab Task Manager Demo Recap: Solution to Running Massive Amounts of Small Jobs Quickly

The most popular demo at the Adaptive Computing booth during Supercomputing 2013 demonstrated the new Moab Task Manager (MTM). MTM is the solution to running massive amounts of short jobs very quickly. By itself, the Moab scheduler is very good at choosing the best place to run jobs. However, it can take some time to determine that because of the rules and logic that are applied to each submitted job. The result is very intelligent placement but at the cost of speed. So when it comes to running many short jobs, the decision-making becomes a bottleneck. That is where MTM comes in.

MTM Communication
MTM is a separate executable apart from Moab that is specified during job submission along with a batch file that has each small job specified in a simple format. Rather than submitting multiple jobs to Moab, only one job is submitted that specifies all the nodes to be used to process the short jobs. After Moab returns the node list of where the jobs will run, MTM wakes up a small daemon on each node and begins to give it workload. The central or lead daemon is designated as the “coordinator” and is responsible for maintaining the work queue for all other nodes. As compute nodes finish executing jobs, they write their results off to a shared files system and execute the next job. By doing this, the file system and the processing become the bottleneck for execution, not the decision making. As a result, very high throughput can be achieved.

Specifically in the demo with 100 compute nodes and 100,000 very small jobs, it completed in less than 2 minutes. In the same environment but with one million jobs, it completed in around 15 minutes. These numbers are specific to the job length and environment running the jobs of course, but it’s clear that MTM will be able to fill an important gap that customers have needed.

Facebook Twitter Email