Troubleshooting Moab Triggers

Firework TriggerAs Independence Day (A U.S. Holiday) draws near, my thoughts turn to explosions of the fireworks going off signaling the celebration. I can’t help thinking about the fuses that act as triggers. And of course, working at Adaptive Computing, we use triggers as part of our workflow.

While Moab may seem to be primarily concerned with the scheduling and optimization of workload placement on available resources, it is also concerned with directing different actions in response to different types of occurrences in your cluster. The mechanism to accomplish this is known as triggers. Triggers are used to respond to events such as:

  • Hardware or software failures
  • Automated maintenance responses
  • Preemptive updates
  • Virtual environment requests

Triggers are very powerful, and can be used to automate many common administrative tasks. However, as with all things of power, it is important to understand how they work. While very helpful, triggers that are misused can cause a great number of problems. Hence, it is vitally important to understand them, why they are used, and how they are constructed before turning over control to them in the cluster or data center.

All triggers require three basic things:

  • Object
  • Event
  • Action

Triggers are defined on a Moab object, and declare both an event to look for, and an action to take place. A trigger definition looks something like this:

SCHEDCFG[Moab] TRIGGER=EType=start,AType=exec,Action=”/opt/moab/tools/report.sh”

 

This trigger is associated with the scheduler Object (Moab). Its Event is start, meaning at the start of the scheduler. The Action is to execute something. In this case, it will run a script called report.sh. Suppose /opt/moab/tools/report.sh looks like this:

#!/bin/bash
echo “Executing showq”
showq > /home/fred/report.txt
echo “Executing df”
df -h >> /home/fred/report.txt

There are 2 errors in this script: 1) The showq command is not found in the default path, and 2) the output will be garbled because there is no valid TERM definition. Let’s see how this plays out. Upon Moab restart, look at the mdiag -T output:

mdiag -T -v
TrigID              Object ID               Event  AType           ActionDate       State
——————- ——————– ——– —— ——————– ———–
0*                  sched:Moab              start   exec  Wed Oct 26 16:11:57     Failure
Launch Time:   -00:02:56
Flags:         multifire,globaltrig
Last Execution State: Failure (ExitCode: 1)
BlockUntil:    00:00:01  ActiveTime:  00:00:07
PID:           1811
RearmTime:     00:05:00
Action Data:   /opt/moab/tools/report.sh
StdOut:        /opt/moab/spool/report.sh.oGEka2S
StdErr:        /opt/moab/spool/report.sh.eiTraYo
Variables:
* indicates trigger has completed

See the Failure statement under the State field.

Take a look at the /opt/moab/spool/report.sh.eiTraYo file and you will see:

/opt/moab/tools/report.sh: line 3: showq: command not found
TERM environment variable not set.

To remedy this, update the report.sh script and add:

#!/bin/bash
export TERM=vt100
echo “Executing showq”
/opt/moab/bin/showq > /home/fred/report.txt
echo “Executing df”
df -h >> /home/fred/report.txt

Now restart Moab and look at the mdiag -T output:

mdiag -T -v
TrigID              Object ID               Event  AType           ActionDate       State
——————- ——————– ——– —— ——————– ———–
0*                  sched:Moab              start   exec            -00:00:17  Successful
* indicates trigger has completed

Now it successfully completes. And you now know the basics of Moab trigger troubleshooting.

Facebook Twitter Email

Speak Your Mind

*