Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Anchor
2.x
2.x
Tractor 2.x Features

The great new features in the New features in Tractor 2 .x releases extend the core Tractor system established in the 1.x family of releases. There are a broad range of new additions and improvements, from productive new command line and scripting interfaces for wranglers, to simple user interface changes. Internal upgrades range from a new high-performance, high-capability job database, to new studio-wide resource sharing and allocation controls. Please refer to the guidelines described in Upgrading.

Here are some highlights:

  • Tractor Product Layout -- Single Release Directory, Single Download per platform, Bundled Subsystem Updates -- The Tractor 2.x packaging and installation layout includes a matched set of Tractor components all in one download: engine, blade, spooler, user interfaces, and scripting tools. They are all installed together in one versioned area, along with only one copy of matched shared resources including pre-built versions of several third-party subsystems.

  • Tractor Query Tools -- Introducing tq the tractor query command line tool and modules. Based on proven Pixar studio tools, tq is the best way to query live or historical Tractor data from your terminal shell, from your Python scripts, or from a new tab in the Dashboard.

  • Adaptive Farm Allocations -- A way to dynamically allocate abstract resources between people or projects using Tractor's flexible Limits system. If two films are in production, 60% of the farm can be allocated to one of them, 30% to the other, leaving the remaining 10% for other projects. If one show is idle the others can temporarily expand their shares, then shrink back to the nominal levels when all projects are active.

  • Dispatching Tiers -- A simple way to organize broad sets of jobs into a descending set of site-defined priority groups. The default tiers are named: admin, rush, default, batch. Create your own!

  • Custom Menu Actions -- Add site-defined Dashboard menu items that can invoke your own centralized scripts, parameterized by the user's current list selection.

  • Job Authoring API -- A new tractor.api.author module allows your Python scripts to easily create Job, Task, and Command objects linked together according to your dependency requirements. The Job.spool() method then sends the resulting job to the tractor-engine job queue.

  • Simple Engine Discovery -- A simple "zero-config" announcement capability for small studios allows tractor-blades and other Tractor tools to find tractor-engine on the local network without requiring manual nameserver (DNS/LDAP) configuration changes. Tractor-engine will automatically disable this SSDP-style traffic at studios where the hostname alias "tractor-engine" has already been created by an administrator in the site nameserver database.

  • Checkpoint-Resume Support -- Extensions to job scripting, dispatching, and the Dashboard add interesting new capabilities related to incremental computation. Tractor also supports a general "retry from last checkpoint" scheme. Both features integrate with the new RenderMan 19 RIS checkpoint and incremental rendering features.

  • Blade Auto-Update -- A simple tractor-blade patch management system allows administrators to "pin" the farm to a particular blade patch version, and automatically push out a new version to the entire farm. Out of date blades restart themselves using the new module version.

  • Pluggable Authentication Module (PAM) support -- The engine's optional new built-in PAM support delegates password validation directly to the operating system on the engine host. This alternative makes it simple to enable password support at studios where the LAN already provides adequate credential transport security.

  • Privilege Insulation -- The EngineOwner setting in tractor.config specifies the login identity under which tractor-engine should operate. This setting is important because it allows the engine to drop root privileges immediately after it has acquired any protected network ports that it may need. The engine's normal day-after-day operations will then occur under the restrictions associated with the specified login name.

  • Dynamic Service Key Advertisement -- Several blade profile "Provides" modes have been added to support some advanced service key use cases. For example, blades can dynamically advertise a different set of service key capabilities depending on which keys have already been "consumed" by previously launched commands.

  • Resource Usage Tracking -- The operating system rusage process metrics CPU, RSS, and VSZ are now recorded into the job database for each launched command. Currently supported on Linux and OSX tractor-blades.

  • Command Retry History -- A unique tracking record is now created in the job database for every command launch attempt. So the history of retries on a given task can be reviewed using the tq tool, for example.

  • Configuration File Loading -- A streamlined override system can help to reduce clutter and improve clarity about which files have been modified from their original "factory settings" at your studio.

  • Task Concurrency Throttle -- Each job can specify a "maxActive" attribute to constrain the number of concurrently running tasks from that job. This optional control over the This quick wrangling control over a job's "footprint" size on the farm can be useful when changing the full site-wide Limits settings is not appropriate.

  • Automatic Blade Error Throttle -- This blade profile setting will prevent blades from picking up new work if they encounter too many errors within a given time interval.

  • Job Spooling Improvements -- Job processing upgrades include faster processing, better error checking, and bundling of required subsystems. A parallelized job intake and database staging scheme can dramatically reduce backlogs when many jobs are spooled simultaneously, or when many "expand" tasks are running in parallel. A self-contained Tcl interpreter bundled with the spooler simplifies site install requirements and can perform client-side error checking prior to job delivery to the engine. A new JSON job spooing format is also supported (but not available prior to beta-1 pending changes).

     

...

Anchor
2.4
2.4
Changes in 2.4 2069290

Tractor 2.4 is an update focused on overall performance, and handling large scale farm size, job size, and number of concurrently connected user sessions. Changes include:

  • A significant code refactoring effort addressed several internal thread contention bottlenecks, including those related to frequent password checks and identity management. The new logic results in improved throughput, especially on very large farms (5000+ blades) where many dashboard sessions and automated scripts (1000+) are accessing job status.

  • A more robust job-global system for sorting newly ready commands produced by "expand" tasks. This change addresses the "Cmd not Ready?" error problem - which was due to sorting key collisions (precision) on large recursively expanded jobs.
  • Optimized limit checking operations, especially around repeated tokenization, trace message construction, and efficient handling of limits at their maximum capacity that temporarily throttle new dispatching.
  • Addition of a new "connecting" entry in the queue/backlog diagnostic (the "status&qlen=1" query). This count gives distinct visibility to the number of connected but incomplete inbound requests, those still awaiting network i/o for their complete http request body contents.
  • The number handler threads for running custom menu item backend scripts now scales up along with other thread pools based on tractor.config settings.

  • Fix for a string handling issue that could result in intermittent loss of access to task log output.

  • Fix for a tractor-engine crash during error message construction in response to unparseable (control) characters in "expand task" job graph extensions.
  • Fixed Dashboard custom job filter matching on array types such as job Project names. Previously only exact matches were working for these entries, but not "starts with" or "contains" comparisons.
  • Updated command-line options for the tractor-spool job submission utility. Several new formatting options such as zero padding are now supported on "range" values used to construct new jobs directly from command line parameters (as distinct from submitting previously constructed complete job files).
  • Fixed a problem with "tractor-spool --ribs *.rib" wildcard expansion handling on Windows.
  • Fixed mismatched version strings in some of the system service start up scripts.

Anchor
2.3
2.3
Changes in 2.3 1923604

  • Updated environment handlers and paths to accommodate batch processing of RenderMan 22 scenes from Maya and Katana.
  • Addressed a Dashboard selection and copy issue related to a change of Firefox webkit css settings.
  • Updated Dashboard links to the newest Tractor documentation.
  • Address path separator issue on Windows in the Python Job Authoring module.
  • Address a task state transition race condition in some "expand chunk" use cases.
  • Fixed full job restart pruning of previously expanded tasks that were created by the "expand chunk" mechanism.
  • Addressed potential security issues related to malicious interface use by on-site Tractor users.


...

Anchor

2.22.2Upgrading to 2.2
  • NOTE: Upgrading to Tractor 2.2 is "permanent" in the sense that you cannot revert to an older tractor-engine while also retaining your old jobs once the 2.2 job database upgrade has been performed. If you BACKUP your current job database before installing 2.2, then it is possible to revert to the older engine version along with jobs restored to their state at the time of the backup. Please refer to the guidelines described in Upgrading.
  • Upgrading to 2.2
  • Upgrading to 2.1
  • Upgrading from 1.x

Anchor
1715407
1715407
Changes in 2.2 1715407

  •  RenderMan (prman) progress messages are now detected correctly by tractor-blade when Katana (renderboot) wraps them in additional logging text.
  • Jobs submitted to tractor-engine from clients using the Python job authoring API method EngineClient.spool() now correctly abide by the current tractor.config setting AllowJobOwnerOverride. Note that this fix requires clients to re-import the new tractor.api.author module.
  • Address a tractor.engine threading issue that could cause large “config” backlogs and slow dispatching in the unusual case where tq scripting clients make requests to the engine as usual, but site network routing issues prevented engine reply buffers from being delivered back to a few of those clients. Now those stalled deliveries time-out without affecting other transactions.
  • Additional internal handling of the "assigner Cmd not Ready” state inconsistency condition that can arise in some cases of simultaneous task retry and job restart.
  • Fix negative elapsed times that were sometimes displayed in the Dashboard as tasks finished, prior to a display refresh.

Anchor
1677499
1677499
Changes in 2.2 1677499

  • Custom menu items will cause a new window to be opened if there is any script output, even if the menu item is configured to normally suppress a new window. This enables problematic scripts to be more easily detected and debugged.
  • Custom menu items support "login" as a special entry in the "values" list, which will cause the Dashboard user name to be a part of the payload sent to the menu item's script.
  • Custom menu items will now observe user names and "@owner" in the crews attribute forjoband task custom menu items.
  • Fixed bug in which selecting next task bystatein Dashboard task list was not automatically scrolling totask.
  • Fixed the calculation of the elapsed time of a skipped task in the rollover of the Dashboard job graph.
  • Sorting of tasks in the Dashboard task list has been corrected.
  • Dashboard preview commands can now be displayed and run for archived jobs.
  • The kill operation is now supported intqand the query API. It is used to kill a running command and leaves the task in an error state.
  • tqhas a newreloadconfigcommand to enable the triggering of configuration file reloading from the command line.
  • The query API will only return non-registered blades when the archive flag is set to True.
  • tractor-dbctl--exec-sqlnow emits error messages.
  • The blade caches user database entries in order to be more resilient against transient LDAP server outages.
  • The systemd configuration directory defaults to the Tractor installation config/ directory, consistent with the sysvinit setting.
  • systemd now starts tractor blade as root by default, consistent with the sysvinit setting.

...

Anchor
1625934
1625934
Changes in 2.2 1625934

  • Added job service key expression support for blade selection based on "total physical RAM". For example the expression

    RemoteCmd {prman my.rib} -service "PixarRender && @.totalmem > 24"
    

    selects blades that provide the "PixarRender" service (blade.config) and which have at least 24 gigabytes of RAM installed. The previously supported "@.mem" key for "available free RAM" is also still available.

  • Enabled access to BladeUse attributes in 'tq blades' queries: taskcount, slotsinuse, and owners.

  • Added --user option to the logcleaner utility script so that a different user to query jobs can be used from the process owner which performs the file removal.

  • Fixed the --add and --remove operations in the tq jattr and cattr commands for making relative changes to job and command attributes that are lists.

  • Addressed a tractor-engine socket exception handling issue on Linux for cases where a tractor-blade host (operating system) has become unresponsive, such as in cases of GPU driver or OOM issues or a kernel panic. The tractor-engine process would sometimes exhibit high cpu load in these cases, spinning in the socket handler.

  • Fixed the access-denied advisory text in JSON responses to retry, skip, and job interrupt URL requets.

  • Suggested workaround for RHEL6 PAM-related file descriptor leak:

    On Linux RHEL 6.x era releases, the pam_fprintd.so module
    contains a bug causing it to leak file descriptors on every call from
    tractor-engine.  Since PAM modules are loaded into the tractor-engine
    process, and it performs many authentications over time, the unclosed
    "pipe" descriptors will accumulate, unknown to the main tractor-engine
    code and will eventually exhaust the available file descriptor limit
    for that engine process.  While many studios do not depend on
    fingerprint validation, especially for scripted API access to a system
    service, the "fprint" module is called indirectly from many common
    RHEL6 PAM policies, including "login" and "su".  It has been removed
    from the common policies in RHEL 7 era distributions.  A workaround
    for RHEL6 is to create your own "tractor" policy that doesn't include
    system-auth, or perhaps to specify a less general policy in crews.config,
    such as password-auth.
    

...