Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Anchor
1496411
1496411
Changes in 2.1 1496411

  • Dashboard Job Notes -- A new Notes field has been added to the Dashboard job details pane, allowing text annotations to be added to any job. Notes are visible to other users, and the presence of a note is indicated with a small "chat bubble" icon in the job list. These notes can be used to describe a problem to wranglers, or to explain why a job needs, or is getting, special handling. The engine will automatically add a note to a job when an attribute is changed through some user action, such as altering priority, so the notes become a history of changes to the job.

  • Dashboard Blade Notes -- A new Notes field has been added to the Dashboard blade details pane, allowing text annotations to be attached to a blade entry. These notes can be used by system administrators to describe known issues or to discuss ongoing admin work on a machine.

  • Dashboard Job Pins -- Individual jobs in each user's job list can now be "pinned" to the top of the list, independent of the global list sorting mode. Jobs might be pinned because they are important to track or just because they represent a current "working set" of jobs. The group of pinned jobs float at the top of the list, and they are sorted according to the overall list sorting mode, within the pinned group.

  • Dashboard Job Locks -- A single user can now "lock" a job from the Dashboard. A locked job can only be modified by the user who locked it. Locks are typically only used by wranglers who are investigating a problem and who want to prevent other users from changing, restarting, or deleting a job while the investigation is proceeding. The lock owner can unlock the job when done. Permission to apply a lock is controlled by the JobEditAccessPolicies "lock" attribute in crews.config.

  • Task Logs 'L' Hotkey -- When navigating the tasks within a job, the logs for the currently selected task can be display by pressing the 'L' key. The key is a toggle, so pressing 'L' again will close the currently open log.

  • User-centric Job Shuffle - Individual users can re-order their own jobs on the queue without disrupting global priority settings. The dashboard job list option "Shuffle Job To Top" essentially exchanges the "place in line" of the selected job with a job submitted earlier from the same user, causing the selected job to run sooner than it would in the default submission order. This swap does not affect the ordering of other jobs on the queue, relative to the submission slots already held by that user. This slightly unusual feature is a simplified re-implementation of the old per-user dispatching order controls in Alfred, as requested by several customers. Permission to perform this kind of reordering is controlled by the JobEditAccessPolicies "jshuffle" attribute in crews.config.

  • The "project" affiliations for each job are now displayed in the job list view.

  • "Delete Job" action is now called "Archive Job" -- The former "Delete Job" menu item has been changed to "Archive Job" to better reflect its actual function: when the db.config setting "DBArchiving" is enabled, jobs that are removed from the active queue are transfered to an archive database where they can still be inspected and searched in tq queries. If DBArchiving is False, then "deleted" jobs are actually deleted and their database entries are removed -- in this case the dashboard menu item still says "Delete Job".

  • Archived Jobs View -- A Dashboard view of previously "deleted" (archived) jobs is now available. This view is analogous to a "trash can view" in some file browsers or e-mail clients. Jobs listed in the archive view can be browsed, and can also be restored to the main job queue where they can again be considered for dispatching. Note that jobs can sometimes contain "clean-up" commands that execute when they finish executing. These clean-ups may remove important temporary files that can make it impossible to re-execute that job.

  • Task progress bars for Nuke renders -- Tractor-blade now triggers a Dashboard progress bar update when it encounters a multi-frame progress message from Nuke, of the form "Frame 42 (7 of 9)".

  • Task Elapsed Time Bounds -- Job authors can now specify an acceptable elapsed time range for a given launched command. Commands whose elapsed time is outside the acceptable range will be marked as an error. Commands that run past the maximum time boundary will be killed. Example job script syntax:

    RemoteCmd {sleep 15} -service PixarRender -minrunsecs 5 -maxrunsecs 20
    
  • Per-Tier Scheduling -- A new extension to the DispatchTiers specification in tractor.config allows each defined tier to have its own scheduling mode. For example, the "rush" tier might be schedule in a strict FIFO order, whereas the default mode might be one of the modes that favors shared-access (like P+ATCL+RR). Tiers can be assigned the new "P+CHKPT" mode to take advantage of partial-graph looping feature in Tractor 2.0; and tiers using that mode should be placed before tiers receiving "classic" non-checkpoint jobs.

  • Site-define Task Log Filters -- A new FilterSubprocessOutputLine() method is now available as an advanced customization feature in the TractorSiteStatusFilter module. This method provides python access to every line of task output. The site-written code can perform arbitrary actions in response to task output, and built-in Tractor-specific actions are also available. These include marking the task as an error, generating percent-done progress updates, initiating a task graph "expand" action, and stripping the output line from the logs.

  • GPU Detection -- On start-up, tractor-blade now makes an attempt to enumerate any GPU devices installed on the blade host. The device model and vendor name "labels" are made available during the profile selection process so that groups of blades can be categorized by the presence or type of GPU, if desired. The "Hosts" dictionary in a blade.config profile definition defines the matching criteria for that profile. Two new optional keys are now available: the "MinNGPU" entry specifies minimum number of GPU devices required for a match; and "GPU.label" specifies a wildcard-style matching string for a particular vendor/model. This label string also now appears in the Dashboard blade list, if a GPU device is found.

  • The new tractor.config setting "CmdAutoRetryStopCodes" specifies a list of exit codes that will be considered "terminal" -- automatic retries will NOT be considered for commands that exit with these codes, unless the -retryrc list for a specific command requests it. Negative numbers represent unix signal values, and the codes 10110 and 10111 are generated when a command's elapsed time falls outside the new run-time bounds options, when given. The default setting for the no-retry stop codes are the values for SIGTERM, SIGKILL, and the two time-bounds codes:

    "CmdAutoRetryStopCodes": [-9, -15, 10110, 10111],
    
  • Engine statistics query -- A new URL request (Tractor/monitor?q=statistics) has been added to help integrate tractor-engine performance metrics with other site-wide monitoring systems. The returned JSON object contains the most recent sample of several statics that the engine collects about itself. This data might be used, for example, to populate an external site monitoring system. Some monitoring systems are able to make this URL request for data directly, while others may require a small data source script to be written that requests the JSON statistics report and then forwards each value of interest to the monitoring system separately.

  • Concurrent Expand Chunks -- This advanced expand task variant task variant provides one approach to avoiding serial delays in jobs containing long-running single commands that produce a sequence of results needed by other tasks in the job. This new extension enables pipeline integrators to construct jobs that launch a long running command, such as a fluid simulation, and then concurrently launch another command, such as a render, when each sequential output file is generated by the first command. Thus rendering can proceed without waiting for all of the simulation steps to complete. This particular approach is well suited to cases where the simulation app is creating output files whose filenames are not known ahead of time, and thus the subsequent render command line arguments must be generated dynamically. The simulation, or a wrapper script, detects when the next step is complete, then it writes the appropriate rendering Task description into a temporary file, and then notifies tractor-blade by emitting the new 'TR_EXPAND_CHUNK "filename"n' line on stdout. Tractor-blade will detect that directive in the application stdout stream and deliver the file contents to the engine. The new render task is inserted into the running job and can be dispatched immediately elsewhere on the farm. The blade will automatically remove the temporary file once it has been delivered.

  • TR_EXIT_STATUS auto-terminate policy change -- the default behavior for the TR_EXIT_STATUS handler has now reverted to the 1.x and earlier 2.x behavior in which the status value is simply recorded and then reported later when the command actually exits. The more recent behavior in which the blade actively kills the app upon receipt of TR_EXIT_STATUS is still available, but it must be explicitly enabled in blade.config using the profile setting:

    "TR_EXIT_STATUS_terminate": 1,
    
  • Blade record visibility flag -- The Dashboard blade list display is created from database records describing each tractor-blade instance that has connected to the engine in the past. These records are retained, even when a blade host is no longer deployed, in order to correlate previously executed commands with the machine they ran on. The dashboard blade list menu item "Clear prior blade data" no longer removes the actual database record for the given blade. Instead it simply sets a flag that hides the record from display in the dashboard. The record (and its new unique id field) are now retained for correlation with old task records. The blade data items can be completely removed manually if they are truly unneeded.

  • Cookie-based Dashboard relogin -- A new policy allows auto-relogin to new Dashboard windows based on a saved session cookie, even when site passwords are enabled. The cookie contains only a session ID that is validated by the engine, it does not contain any password data itself. The older policy that denied auto-login when passwords are required can be restored by adding a "_nocookie" modifier to the crews.config SitePasswordValidator setting.

  • Added a new tractor-dbctl --set-job-counter option that sets the initial job ID value in a new job database. Job IDs start 1 by default, so this ability to specify a different starting value can be helpful when starting from a fresh Tractor install in order to prevent overlaps between the job IDs from the new install and older jobs. Tractor upgrade installs that reuse the prior job database will continue to see job ID continuity.

  • Several internal improvements have been made to the job database upgrade proceedure. Many code-related changes in new releases can now be applied without a significant database alteration, needing only an engine restart. Changes involving new database schema definitions are now applied with a system that better handles upgrades across multiple versions.

  • Overall throughput optimizations -- Various performance improvements have been made in the this release, especially with regards to handling large numbers of simultaneous updates as many jobs complete or are deleted at the same time.

...

Anchor
1393388
1393388
Changes in 2.0 1393388

  • Added Dashboard task graph visualization of Instance nodes.
  • Add a "quick job syntax check" option: tractor-spool --parse-debug (job.file)
  • Supplement the tractor-blade TR_EXIT_STATUS handler such that it will now actively kill running applications that emit TR_EXIT_STATUS directives if they do not exit on their own in a timely manner. This behavior can be useful for simple wrapper scripts that cannot implement the full process-group shutdown that tractor-blade already provides. The new behavior can be disabled in blade.config by adding "TR_EXIT_STATUS_terminate": 0,
  • Fix storage of afterJids attribute edits on previously spooled jobs.
  • Fix a dispatching problem that resulted in "no dispatchable tasks" in some cases after task retries and "afterJids" delays.
  • Address unicode handling issues for non-ascii characters in RemoteCmd application parameter lists (aka comman-line argv).
  • Fixed Dashboard display of elapsed time for still-active tasks to avoid issues caused by clock differences between the engine and user hosts.
  • Fix job ready-task counts reported by the Dashboard and tq in some cases following retries.
  • Removed engine start-up usage of a platform-dependent external python module, psycopg2.
  • Updated the Tractor Query API to improve its python module conformity.
  • Changed the way that tractor-blade reloads site-defined TractorSiteStatusFilter modules on profile refresh to improve predictability.
  • Extended the engine's expand-node output handling to tolerate some older Alfred-compatible no-op constructs.
  • Allow TractorSiteStatusFilter to set an advisory status message (aka "excuse") that is visible in the Dashboard blade list, even when the site-specific callback is allowing a request for work to proceed.

Release Notes from Prior Versions

See the Tractor 1.x Release Notes for details about earlier releases
  • .