The blade.config file specifies server profiles, each of which may apply to one or more renderfarm hosts. The idea is to have a central configuration (policy) file that controls all machines, and to allow a single profile to apply to many machines, as necessary. New servers can be brought online and will join the renderfarm using a simple "plug and play" approach using the settings for whichever profile they match.
The blade.config file is typically copied in its entirety to the site override directory and then customized there. It contains a dictionary with two top-level key entries:
Because of the "first match wins" policy, profile definitions that are specific to just a few machines should go near the top of the BladeProfiles list, while "catch-all" general profiles should go at the end, ensuring that specialized blades are matched to the appropriate profile.
There are several "required" keys that are specified in each profile object (dictionary) in the BladeProfile array. Most of the required keys are specified in the ProfileDefaults dictionary, and overrides are applied from individual profiles. Here's an example for reference:
{ "ProfileDefaults": { "ProfileName": "default", "Provides": ["pixarRender", "pixarNRM"], "Hosts": {"Platform": "*"}, "Access": { "Crews": ["*"], "NimbyCrews": ["*"], "NimbyConnectPolicy": 1.5 }, "Capacity": { "MaxSlots": 1, # 0 -> use number of system CPUs "MaxLoad": 1.5, # CPU load avg, normalized by cpu count "MinRAM": 1.0, # gigabytes "MinDisk": 20.0, # gigabytes }, "UDI": 1.0, "RecentErrorThrottle": [5, 30, 120], #"CmdOutputLogging": "logfile=/fileserver/tractor-logs/%u/J%j/T%t.log", "CmdOutputLogging": "logserver=${TRACTOR_ENGINE}:9180", #"VersionPin": "@env(TRACTOR_ENGINE_VERSION)", # "TaskBidTuning": "immediate", "TR_EXIT_STATUS_terminate": 0, "SiteModulesPath": "", "DirMapZone": "nfs", "EnvKeys": [ { "keys": ["default"], "environment": {}, "envhandler": "default" }, ] }, "BladeProfiles": [ { "ProfileName": "BigLinux", "Hosts": { "Platform": "Linux-*", "MinNCPU": 8, # must have at least 8 CPUs "MinPhysRAM": 16, # must have at least 16 GB }, "UDI": 11, "EnvKeys": [ "@merge('shared.linux.envkeys')" ] }, { "ProfileName": "OtherLinux", "Hosts": {"Platform": "Linux-*"}, "EnvKeys": [ "@merge('shared.linux.envkeys')" ] }, { "ProfileName": "our_next_profile", ... } ] }
As each server iterates through the profile entries, they test their own host information against the Hosts dictionary of patterns to determine if the entry applies.
Each entry in the Hosts dictionary represents a test, and they all must pass for the profile to apply to that blade.
Matching is done with glob-style string compare.
The pattern specified by dictionary item "Name" is compared to all of the server's hostnames, aliases, and "dotted quad" ip-addresses. The "Name" entry can be either a list, or single entry. Each entry may include glob-style wildcard chars. Note! the names and addresses entered here should be the ones seen by the engine, and secondarily the ones seen by the blade host itself; hosts using VPN tunnels or other routing may not have the same data visible on the engine and the blade.
The pattern given by "Platform" is matched against string generated by the following Python expression on each server:
platform.platform() + '-' + platform.architecture()[0]
The value specified for "NCPU" is compared (exactly) against the number of CPUs reported by the host.
Use "MinNCPU" to test for hosts that have at least the given number of CPUs.
Use "MinPhysRAM" to test for hosts that have at least the given gigabytes of memory installed.
Use "PhysRAM" to test for hosts that have exactly the given gigabytes of memory installed (rounded to the nearest integer).
Use "MinNGPU" to test for hosts that have at least the given number of GPU cards. See "GPUExclusionPatterns" in blade.config for filtering.
Use "GPU.label" to test for hosts that contain a specific type of GPU. These are wildcard matching patterns like ["*Quadro*K620*", "NVIDIA*"]
Use "PathExists" to test for hosts that have particular directories or files present. These could be installed applications, or possibly different fileserver access mount points. The value here should be a list of path strings, ALL of the paths must be present to match.
"MaxSlots": (int) - maximum concurrent commands on the server. The special value '0' (zero) can be used to automatically constrain MaxSlots to the numberof CPUs found on the host (this is the default).
"MaxLoad": (float) - keep asking for tasks while loadavg is below this threshold.
"MinRAM": (float GB) - keep asking for tasks while free RAM is greater than this minimum.
"MinDisk": (float GB) - keep asking for tasks while free disk space is greater than this minimum. This key accepts two value formats:
"MinDisk": 3.75,Or:
"MinDisk:[3.75, "/scratch"],The second form names the disk to check.
[ { "keys": ["default"], "environment": { "REMOTEHOST": "$TR_SPOOLHOST" }, "envhandler": "default" }, { "keys": ["prman-*"], "environment": { "RMANTREE": "/opt/pixar/RenderManProServer-$TR_ENV_RMANVER", "PATH": "$RMANTREE/bin:$PATH" }, "envhandler": "rmanhandler" }, { "keys": ["JOB=*", "TASK=*", "SCENE=", "PROD=*"], "environment": { }, "envhandler": "productionhandler" }, ]
Each dictionary is a named environment and defines a list of environment keys which are matched against the incoming envkeys from the job to be launched.
The environment entry contains overrides to the default environment variables in which the command will run. The envhandler key contains the name of a Python environment handler that will operate on the command prior to being executed. The envhandlers can be extended by adding modules into the directory defined by the SiteModulePath entry. See the envhandler documentation for more details.
SiteModulesPath - Defines a directory reachable by the blades which will contain additional python environment handlers. See the envhandler documentation for more details.
CmdOutputLogging - destination for launched command output logs See the separate logging discussion for details on this setting.
VersionPin - Specifies a particular tractor-blade "module version" that should be running on blades that are using the given profile. Those hosts are "pinned" to a particular version of the blade software. This setting is typically only given in the ProfileDefaultssection, as a way to push a blade update to the whole farm, pinning all machines to a uniform blade module version.
When an administrator reloads blade.config, then blades will inspect this setting in their profile. If a blade discovers that it is not running the correct module version, then the it will download the indicated module from the engine and restart itself using the new module. The blade only fetches a prepackaged version of the core tractor-blade python module itself, it does not update the entire Tractor install area, nor the bundled "rmanpy" python interpreter or other Tractor tools; a full install on the blade host is necessary to upgrade those other components.
The named module patch package must already be present on the site tractor-engine host, typically placed there by an administrator. Each full Tractor product install contains the "factory" blade module package for that release; other blade module packages may be made available through customer support channels. These packages are self-contained zip-format modules, using the ".pyz" filename extension.
When a blade requests a pinned module download, the engine will look for it in two places. First it looks in the engine's "--configdir" directory, that is: the directory on the engine host containing other site-modified configuration overrides. Secondly it looks in the installed product directory for the shipped package that matches the current release.
TaskBidTuning - When set to "immediate" the blade will ask for new work immediately upon the (successful) exit of a prior task, or when a new task has been launched and additional slot capacity is still available. Otherwise blades will wait more conservatively between requests (minsleep secs). The "immediate" mode will cause blades to cycle through any waiting fast-running commands very quickly.
TR_EXIT_STATUS_terminate - Controls whether scripts that emit "TR_EXIT_STATUS nnn" directives are left to eventually exit on their own with the given exit status code override, or should they be actively killed by tractor-blade if they don't exit promptly. Use 0 (zero) to wait, or 1 (one) to actively kill them.
RecentErrorThrottle - a setting like [5, 30, 120] specifies that 5 errors on the same blade within 30 seconds will cause a 120 second hiatus in new task assignments to that blade. An "error" in this context means that a command launched by the blade exitted with a non-zero exit status code or unexpected signal due to a crash. When -1 is given as the third parameter in the list (replacing "120" in the example above), then blades that hit the error threshold will place themselves into a "nimby" state instead; in this state they will not receive new task assignments until the nimby state is unset manually by an administrator via the Dashboard or tq. Note that this setting can either be applied to all blades by putting it in the "ProfileDefaults" dictionary, or it different settings can (or overrides to the default) can placed separately in specific profiles.
DirMapZone - Defines the zone used in directory mapping for cross platform support. The blades initialize with a default DirMapZone (Windows="unc", Linux and Mac="nfs"). The value in the blade profile overrides the internal default.
UDI - the Universal Desirability Index
NIMBY - Adding NIMBY to a blade profile will define which jobs can be picked up by a blade. Settings include:
Note that "NimbyCrews" setting in the "Access" dictionary (described above) controls which users, if any, are allowed to change this setting dynamically through the dashboard, or other session-validated client. Note also that when providing any "Access" override in a particular profile definition, that the entire set of desired Access key-value pairs need to be specified, the unspecified keys will not be inherited from the Access values in ProfileDefaults.
GPUExclusionPatterns -- exclude certain GPU types from matching and counting consideration. Note that this keyword is only valid in the ProfileDefaults dictionary, and GPU filtering is performed prior to any per-profile match testing. Background: A given profile can match specific hosts based on several criteria in the "Hosts" clause, these can include the count and type of GPU. Some hosts contain "uninteresting" virtual or underpowered GPUs that should always be excluded from consideration, PRIOR to the profile matching pass. Use the "GPUExclusionPatterns" list here to enumerate the makes/models of GPUs to be skipped in counts and matches. Note that "GPUExclusionPatterns" is restricted to the ProfileDefaults block only (here), it is ignored inside individual profile definitions since GPU counting occurs prior to matching. Each item in the list is a simple "glob-style" wildcard pattern, and patterns without '*' or '?' will be treated as "*TEXT*". "GPUExclusionPatterns": ["QXL", "Standard VGA"],