...
Editing Job Attributes
General job attribute editing - Attributes entries listed in a job's
jobinfo.json dictionary can be changed using an http URL of the form:
/Tractor/queue?q=jattr&set_ATTRNAME=VALUE&jid=NNNN&tsid=sss
Similarly, attributes at the command-level can be changed using the "q=cattr&cid=..." variant (note: only set_service is supported at this time):
/Tractor/queue?q=cattr&set_ATTRNAME=VALUE&jid=NNNN&cid=CCCC&tsid=sss
Note that crews.config can specify permissions that restrict editing of particular attributes to specific crews, i.e. Wranglers only.
Change the job-level Service Key requirements:
http://ENGINE/Tractor/queue?q=jattr&jid=5678&set_service=someKey&tsid=sssReprioritize a job:
http://ENGINE/Tractor/queue?q=jattr&jid=5678&set_priority=12.34&tsid=sssSuspend new dispatching in a job:
http://ENGINE/Tractor/queue?q=jattr&jid=5678&set_pause=1&tsid=sss
Use "&set_pause=0" to unpause the job. This type of "pause" suspends new task dispatching - the job will be skipped during blade assignment passes until unpaused. Any already running tasks from that job will continue to run.
...
Engine Metrics and Statistics
Get the most recent engine health statistics:
http://ENGINE/Tractor/monitor?q=statistics
The query returns various tractor-engine performance and status metrics. The currently provided values are intended to give administrators a quick set of engine "vital signs" to check when looking at overall Tractor health or for engine hot spots.
The statistics report is organized as a JSON dictionary. Each key in the dictionary represents one type of metric. The value field in the dictionary for each key is an array of recent samples for that metric.
The current dictionary metrics names are listed here. The list may evolve over time.
- "dt" - the actual length of each sampling interval
- "enla" - engine normalized load avg, cpu load on the engine host
- "rtot" - total inbound engine queries in the interval
- "asn" - assignments made during the interval
- "nrun" - count of currently dispatched cmds running on the farm
- "nctot" - total backlog of unexecuted commands in all waiting jobs
- "nslot" - total slots on the farm reported by blades
- "nsin" - total slots in use on the farm
- "tecpu" - estimate of current CPU utilization due to tractor-engine
- "temem" - estimate of current RAM utilization due to tractor-engine
- "dbcpu" - estimate of current CPU utilization due to the job database
- "dbmem" - estimate of current RAM utilization due to the job database
- "dbcqn" - backlog of pending job state database commits
- "iqn" - backlog of waiting requests on the main intake queue
- "sqn" - backlog of waiting requests on the shipper pool queue
- "aqn" - backlog of waiting requests on the assigner queue
- "qqn" - backlog of waiting requests on the job archive queue
- "jqn" - backlog of waiting requests on the job submission queue
- "mqn" - backlog of waiting requests on the monitor UI queue
- "nmbox" - subscribed UI sessions
Get a recent history of engine statistics:
http://ENGINE/Tractor/monitor?q=statslog
This query is like q=statistics, above, but it returns several minutes worth of samples for each metric.
Note: for continuous monitoring add "&stats=1" to your recurring q=subscribe request; statistics will arrive as 's' mbox messages.
The last entry in each array is the most recent sample. The engine maintains a simple ring buffer of these samples, so on the next update the first array element is dropped and all of the other samples appear to "shift left" with a new value appearing at the end. The statslog query returns all of the samples in the buffer so that a UI can generate a complete graph from a single query.