Returns From Lincoln Ascent

Some of the numbers have started to come in from work done for Lincoln Ascent and things are looking promising to this point. The parts that are rounding up right now focus on speeding up Moab’s scheduling iteration and reducing the network overhead between TORQUE and Moab.

One of the excesses in network communication between Moab and TORQUE lies in information that hasn’t changed since the last time the two have synced. In Lincoln, TORQUE offers the ability to request that repetitive information be left out of responses to job queries. Specifically, if a job hasn’t changed within so much time, only a summary of the job’s attributes will be sent in response instead of sending all of the job’s information. Moab also has a parameter that can be set which instructs Moab to request condensed output for jobs. Moab also has the intelligence to notice that in some circumstances it should get the full output for the job instead of condensed output, and in those circumstances it makes sure to have all of the information. Such scenarios include extended periods of resource managers being unresponsive, Moab being restarted, etc.

Another piece of work is centered in removing the overhead of getting a response from resource managers. Prior to Lincoln, Moab was unresponsive while it was asking for information from the resource manager or resource managers. As of Lincoln, Moab has a parameter that effectively backgrounds this process and keeps Moab in the phase of responding to user commands until it receives a response from all configured resource managers.

There remain many different scenarios that could be benchmarked for these two enhancements, but initial results are very promising. We ran a test where the queue was maintained at 100k jobs where every job took less than 1 second to execute, and we recorded the scheduling iteration times for different permutations of these settings. Here is a graph of the results:

image(1)As you can see, the scheduling iterations times are much shorter with these parameters enabled. With just the condensed output enabled, the average iterations are sped up by 12%, and the worst-case scenarios are sped up 44%. If just backgrounding the RM query is enabled, then the average scenario is 29% faster and the worst-case scenario is 50% faster. Finally, if both are enabled, the average case scenario if 55% faster and the worst-case scenario is 69% faster.

Some important things to notice about this test: jobs that execute immediately don’t maximize the utility of condensing idle job output. Since a high number of jobs (for our test, about 6k) are started each iteration that means less jobs are remaining unchanged. However, having a very large job queue does increase the utility of this feature. All in all, this is probably a middling scenario for this feature, and workload types that include large numbers of jobs that remain in the same state for extended periods of time may get more benefit from this feature. At the same time, smaller workloads may not really notice much benefit from this feature.

As far as backgrounding the RM query, a very large workload such as 100k jobs is a maximizing scenario for this feature because the more jobs that need to be reported, the longer this will take. Sites that have even more jobs may receive even more benefits, but sites that have less jobs in the queue may not see as pronounced of a benefit. The bottom line is that these are numbers for this scenario, and there are many scenarios that are better or worse out there.

We are, however, very optimistic about these numbers. We are especially pleased that all of these improvements provided the biggest improvement towards the worst-case scenarios. With none of these parameters engaged in this test, we see the slowest scheduling iteration being more than twice as long as the average iteration, but when both are set, the slower iteration is less than one and a half times as long as the average case, resulting in a much more consistent experience.

Initial returns are very promising, and we’re very excited to continue to improve scalability and throughput across our solutions.

Facebook Twitter Email