make sure no job filter or client side search box is preventing the job from being displayed,
check that the dashboard is pointing to the same engine as the entity spooling the job, and
check the engine log file that a "job Loaded" message had appeared when the job was spooled. Adjacent error messages in the log file could indicate that there was a syntax error in the job file.

You can also look at recently spooled jobs on the command line. e.g. "tq jobs mine --sort jid"

The job must be able to run.

Verify that the job is not:

paused,
delayed, nor
waiting of other jobs to finish, as specified in its afterJids list.

The job must have ready tasks.

Verify that the job actually has ready tasks. When a job has been submitted, there will be at least one ready task that can run (unless the job author has been able to weave a job with circular dependencies!). Ready tasks will be leaf nodes on the job graph, though not necessarily all leaf nodes will be ready. A job's ready tasks can be listed on the command line with "tq tasks state=ready and jid=<jid>".

The remaining tasks at submission time will be blocked; a task will become ready once all of its child tasks have been successfully completed (or have been manually skipped). In the case of serial subtasks, a task may have to also wait for its "older" sibling tasks to finish before it can become ready.

Blades must be making requests for work.

Verify that blades are actually requesting work (and for the correct engine:port). Participating blades will be visible in the blades tab of the dashboard. If there are none, ensure that blades are contacting the correct engine and that the blade and engine hosts have the proper ports open in their firewall. The default port for the engine is 80 (TCP), and for blades is 9005 (TCP). (Note that 9005 (UDP) must be open on the blade and engine in order for progress messages to be reported to the engine; however, this will not affect whether or not a task starts on a blade.)

You can verify that a blade is requesting work by running a tracer in the dashboard. This will report which jobs have been considered for dispatching. If the job in question is displayed in the trace, the reason why it has not picked up will be displayed. If the job in question is not displayed, it may be because a task of a higher priority job was chosen, or because of other scheduling rules that would preclude the job from running on the blade.

Also consider that blades may not request work if they incurr a number of task failures within some interval. Such blades are marked with error accrual hiatus in the Note column of the blade list. The RecentErrorThrottle setting in the blade's profile manages the parameters of this hiatus calculation.

Scheduling rules must permit the job to be considered.

There are several scheduling rules that determine whether or not a task runs on a given blade:

the requesting blade's service keys must match the task's service key expression,
the requesting blade's metrics must satisfy the task's metric requirements,
the task must not be restricted by limits. The limits in question could be represented by the limit tags of the job, task, or the task's current command, and
the task must not be restricted by sharing limits. The limits in question would be those associated with the job's project.

Even if a task might satisfy those requirements, there may be a task of a higher priority job that also satisfies those requirements, and is thus picked up by the job.

Also note that every task assignment and completion affects limit counters, which will affect which tasks will be eligible for the next blade's request.

Page tree

Contents

Pixar Animation Studios

Site

Resources

Company

Connect