|Table of Contents|
Tractor 2.x Features
Anchor 2.x 2.x
New features in Tractor 2 extend the core Tractor system established in the 1.x family of releases. There are a broad range of new additions and improvements, from productive new command line and scripting interfaces for wranglers, to simple user interface changes. Internal upgrades range from a new high-performance, high-capability job database, to new studio-wide resource sharing and allocation controls. Please refer to the guidelines described in Upgrading.
Here are some highlights:
Tractor Product Layout -- Single Release Directory, Single Download per platform, Bundled Subsystem Updates -- The Tractor 2.x packaging and installation layout includes a matched set of Tractor components all in one download: engine, blade, spooler, user interfaces, and scripting tools. They are all installed together in one versioned area, along with only one copy of matched shared resources including pre-built versions of several third-party subsystems.
Tractor Query Tools -- Introducing tq the tractor query command line tool and modules. Based on proven Pixar studio tools, tq is the best way to query live or historical Tractor data from your terminal shell, from your Python scripts, or from a new tab in the Dashboard.
Adaptive Farm Allocations -- A way to dynamically allocate abstract resources between people or projects using Tractor's flexible Limits system. If two films are in production, 60% of the farm can be allocated to one of them, 30% to the other, leaving the remaining 10% for other projects. If one show is idle the others can temporarily expand their shares, then shrink back to the nominal levels when all projects are active.
Dispatching Tiers -- A simple way to organize broad sets of jobs into a descending set of site-defined priority groups. The default tiers are named: admin, rush, default, batch. Create your own!
Custom Menu Actions -- Add site-defined Dashboard menu items that can invoke your own centralized scripts, parameterized by the user's current list selection.
Job Authoring API -- A new tractor.api.author module allows your Python scripts to easily create Job, Task, and Command objects linked together according to your dependency requirements. The Job.spool() method then sends the resulting job to the tractor-engine job queue.
Simple Engine Discovery -- A simple "zero-config" announcement capability for small studios allows tractor-blades and other Tractor tools to find tractor-engine on the local network without requiring manual nameserver (DNS/LDAP) configuration changes. Tractor-engine will automatically disable this SSDP-style traffic at studios where the hostname alias "tractor-engine" has already been created by an administrator in the site nameserver database.
Checkpoint-Resume Support -- Extensions to job scripting, dispatching, and the Dashboard add interesting new capabilities related to incremental computation. Tractor also supports a general "retry from last checkpoint" scheme. Both features integrate with the new RenderMan 19 RIS checkpoint and incremental rendering features.
Blade Auto-Update -- A simple tractor-blade patch management system allows administrators to "pin" the farm to a particular blade patch version, and automatically push out a new version to the entire farm. Out of date blades restart themselves using the new module version.
Pluggable Authentication Module (PAM) support -- The engine's optional new built-in PAM support delegates password validation directly to the operating system on the engine host. This alternative makes it simple to enable password support at studios where the LAN already provides adequate credential transport security.
Privilege Insulation -- The EngineOwner setting in tractor.config specifies the login identity under which tractor-engine should operate. This setting is important because it allows the engine to drop root privileges immediately after it has acquired any protected network ports that it may need. The engine's normal day-after-day operations will then occur under the restrictions associated with the specified login name.
Dynamic Service Key Advertisement -- Several blade profile "Provides" modes have been added to support some advanced service key use cases. For example, blades can dynamically advertise a different set of service key capabilities depending on which keys have already been "consumed" by previously launched commands.
Resource Usage Tracking -- The operating system rusage process metrics CPU, RSS, and VSZ are now recorded into the job database for each launched command. Currently supported on Linux and OSX tractor-blades.
Command Retry History -- A unique tracking record is now created in the job database for every command launch attempt. So the history of retries on a given task can be reviewed using the tq tool, for example.
Configuration File Loading -- A streamlined override system can help to reduce clutter and improve clarity about which files have been modified from their original "factory settings" at your studio.
Task Concurrency Throttle -- Each job can specify a "maxActive" attribute to constrain the number of concurrently running tasks from that job. This optional control over the This quick wrangling control over a job's "footprint" size on the farm can be useful when changing the full site-wide Limits settings is not appropriate.
Automatic Blade Error Throttle -- This blade profile setting will prevent blades from picking up new work if they encounter too many errors within a given time interval.
Job Spooling Improvements -- Job processing upgrades include faster processing, better error checking, and bundling of required subsystems. A parallelized job intake and database staging scheme can dramatically reduce backlogs when many jobs are spooled simultaneously, or when many "expand" tasks are running in parallel. A self-contained Tcl interpreter bundled with the spooler simplifies site install requirements and can perform client-side error checking prior to job delivery to the engine. A new JSON job spooing format is also supported (but not available prior to beta-1 pending changes).
Bug fixes and improvements
- Fixed a regression in the prior 2.4 release that meant service keys might be reported incorrectly
- The launchd plist for running the macOS services had several outdated keys and caused errors to be reported and the service not to be restarted under certain conditions
- On Windows, AMD CPUs like the 3990x would report 64 instead of the expected 128 cores
Tractor 2.4 is an update focused on overall performance, and handling large scale farm size, job size, and number of concurrently connected user sessions. Changes include:
A significant code refactoring effort addressed several internal thread contention bottlenecks, including those related to frequent password checks and identity management. The new logic results in improved throughput, especially on very large farms (5000+ blades) where many dashboard sessions and automated scripts (1000+) are accessing job status.
- A more robust job-global system for sorting newly ready commands produced by "expand" tasks. This change addresses the "Cmd not Ready?" error problem - which was due to sorting key collisions (precision) on large recursively expanded jobs.
- Optimized limit checking operations, especially around repeated tokenization, trace message construction, and efficient handling of limits at their maximum capacity that temporarily throttle new dispatching.
- Addition of a new "connecting" entry in the queue/backlog diagnostic (the "status&qlen=1" query). This count gives distinct visibility to the number of connected but incomplete inbound requests, those still awaiting network i/o for their complete http request body contents.
The number handler threads for running custom menu item backend scripts now scales up along with other thread pools based on tractor.config settings.
Fix for a string handling issue that could result in intermittent loss of access to task log output.
- Fix for a tractor-engine crash during error message construction in response to unparseable (control) characters in "expand task" job graph extensions.
- Fixed Dashboard custom job filter matching on array types such as job Project names. Previously only exact matches were working for these entries, but not "starts with" or "contains" comparisons.
- Updated command-line options for the tractor-spool job submission utility. Several new formatting options such as zero padding are now supported on "range" values used to construct new jobs directly from command line parameters (as distinct from submitting previously constructed complete job files).
- Fixed a problem with "tractor-spool --ribs *.rib" wildcard expansion handling on Windows.
Fixed mismatched version strings in some of the system service start up scripts.
- Updated environment handlers and paths to accommodate batch processing of RenderMan 22 scenes from Maya and Katana.
- Addressed a Dashboard selection and copy issue related to a change of Firefox webkit css settings.
- Updated Dashboard links to the newest Tractor documentation.
- Address path separator issue on Windows in the Python Job Authoring module.
- Address a task state transition race condition in some "expand chunk" use cases.
- Fixed full job restart pruning of previously expanded tasks that were created by the "expand chunk" mechanism.
- Addressed potential security issues related to malicious interface use by on-site Tractor users.