|Table of Contents|
|Carousel Image Slider|
Checkpoints are snapshots that save the state of the image as the renderer works on it. While viewable view-able as ordinary images, these are also slightly larger than usual because they embed extra state that the renderer needs in order to recover the render. If the render is interrupted or fails for some reason, the renderer can resume the render from the last checkpoint image. If instead, the render finishes then the extra state will be removed when writing the final version of the image. There are two main ways to produce checkpoints:
Alternatively, you could set the interval to 300s and the renderer will update the image approximately every five minutes after it starts work on the frame. This time includes the renderer startup time, such as parsing RIB, cracking procedurals, and building ray tracing acceleration structures. As a result, there may be fewer increments before the first checkpoint than between later checkpoints. For convenience, the time-based interval can also be specified with a suffix suffixes of s, m, h, or d for seconds, minutes, hours, and days respectively and these may be combined. For example, intervals of 360s, 6m, 5m60s, and 0.1h are all equivalent. Instead of a suffix, you can also just specify a positive number and time in seconds will be assumed, while a negative number will be interpreted as the number of increments.
Note that checkpointing is designed for batch rendering to images on disk. Renders to a live framebuffer such as "it" are already updated on-the-fly as the render proceeds. Of the built-in display drivers, currently, only the TIFF and OpenEXR display drivers support checkpointing. You may notice that the files produced include a ~w labeled channel. This is data stored to resume a render from this file.
How to Specify
In RIB, checkpoints can be specified via an Option or by passing the optional
-checkpoint argument to
prman -checkpoint t[,t]
The above command is seen as the interval given and an optional exit time (the square backets).
You can use several different types of intervals and may be combined:
- 60s - this is 60 seconds
- 60m - this is 60 minutes
- 60h - this is 60 hours
- 60d - this is 60 days
- 60i - this is 60 increments (incremental frame rendering, this is ignored if you're rendering to tiles/buckets)
You can combine these as needed, for example: -checkpoint 1h30m would create a checkpoint every hour and 30 minutes
The options for exitat are the same and can even be provided without needed a checkpoint written, in which case you've just told the renderer you want to stop a render and write out the result at a certain time. Below we write out a final result and exit at 30 minutes
prman -checkpoint ,30m
It is possible to maintain a checkpointed "final" image using either an .ini setting or a RIB option, with the later overriding the former. If neither is present it defaults to off. When set, this prevents removing the extra channels and the checkpoint tag when writing the final image for the render. The final image will essentially be just another checkpoint, rather than a slimmed down image. This means that once your image has reached the quality you've set and it completes, it can always be restarted by the user later:
/prman/checkpoint/asfinal [bool] Option "checkpoint" "int asfinal" [0|1]
Temporary Files While Writing
When writing checkpoint files, we now write them to temporary files with a .part extension added. E.g., foo.exr will be written to foo.exr.part and kittens.ddc will be written to kittens.ddc.part during the checkpointing process.
Only after all of the files to write for a checkpoint have been written will these files be renamed to strip the .part extension and overwrite the previous checkpoint. This step happens immediately before the postcheckpoint command is run, if any.
Though not strictly atomic, this renaming is done in as small of a window of time as the OS permits in order to avoid mixing new checkpoint files with old. If the renderer is killed or dies for some reason while checkpointing, there may be some .part files left over. Deleting these should safely leave the previous checkpoint intact. Note that looking for these .part files is one way of detecting whether the renderer was killed during a checkpoint. Note too, that this new behavior means that the peak disk space used by checkpointing is now doubled.
When using checkpointing a deep EXR while using adaptive sampling, all of the AOVs considered by the adaptive sampler (e.g., normally the beauty) must be written as well to a shallow EXR. Otherwise, the checkpoint will not be recoverable.
We support Deep Data Checkpointing (DDC files) when rendering to Deep EXR formats. When using this feature you will not only see a "shallow" EXR written to disk for each checkpoint but also the DDC, which uses lossless compression, file for recovering the deep data. Since checkpointing often leaves multiple files on disk and deep data can be expensive to store, there is an option to compress the resulting deep data.
Compression can be set via a rendermn.ini option:
The default level is 7. The typical range is 1 to 20 where higher numbers increase the compression at the cost of speed.
The default level of 7 is chosen as a balance between size and performance. Values above 7 will decrease the performance of writing the files with diminishing returns on compression as values go higher. For this reason we do not recommend going above 7 unless your storage capacity is severely limited. A value of 1 will result in large files but the best performance and is ideal when storage for DDC files is plentiful.
To reiterate, note that this is lossless compression, unlike the legacy deepshadowerror rendering option. The choice of the ddccomplvl has no effect on image quality after recovery and is purely a question of checkpoint time performance versus disk space.
Skipping Deep EXRs or DDC files
When recovering checkpoints involving deep data, any deep EXRs are ignored and only the DDC files are used. If you do not need to view the deep EXRs from a checkpoint, you can now save both disk disk space and the time spent on processing and I/O to generate these deep EXRs by disabling them. To do this, set the following in your rendermn.ini to anything “truthy” (e.g., true, yes, on, or 1):
Even with this option set, we still write the deep EXRs when the render finishes normally (i.e., either hitting max samples or stopped by the adaptive sampler) since you cannot directly view or composite with the DDC files.
Also note that this option only applies to deep EXR files. All shallow EXRs are still written out with checkpoints as normal, albeit with the usual checkpointing channels and metadata.
The counterpart to all this is that the renderer now also normally saves time by skipping the writing of DDC checkpoint files when the render finishes and it is writing the final deep EXR image. You can use the existing asfinal option (either Option ”checkpoint” ”int asfinal” in the RIB or /prman/checkpoint/asfinal 1 in the rendermn.ini) to prevent this and have it write the DDC files anyway.
Finally, note that we continue the prior behavior of not deleting the last DDC files, even when the renderer finishes normally and asfinal is not set. At that point, the DDC files correspond to the checkpoint just before finishing and become stale data. This is something we may change in the future, depending on feedback, but remains the behavior for now.
This doesn’t strictly pertain to deep checkpoints, but shallow EXR checkpoint images now contain a new EXR attribute, checkpointElapsed. This represents the elapsed time in seconds since the renderer started until checkpointing the current image begins. It does not include prior rendering time if this process was started from a recovered checkpoint.
Recovery of an interrupted render is enabled by passing the
-recover 1 option to
prman when starting a render. RenderMan will then load the scene as normal but rather than start from scratch and overwrite the existing images it will examine them to determine where it was interrupted. If successfulIfsuccessful, it will continue from close the point where it left off. If instead the images were finished, missing or don't match the current scene or each other for some reason, it will silently start from scratch.
Recovery can also be paired with the checkpointing options described above. In the case of a time limit, this basically serves as an extension to the original time limit. Incremental renders can be broken into an arbitrary number of time slices this way. Note that each recovery requires the scene and all of its assets to be reload, however, so care should be taken with this. Additionally, this feature is only supported for OpenEXR files.
Bucket/tile order must be set to horizontal or vertical when recovering from a non-incremental render.
RenderMan can call a system command after a checkpoint using
Option "checkpoint" "string command"; this can also be specified through the
rendermn.ini file with
/prman/checkpoint/command. If system calls are enabled, then after a checkpoint has been written, the specified command will be called. This is synchronous; the rendering threads are quiescent while this runs and will not resume again until the process returns, avoiding possible race conditions if the command takes a while.
A limited amount of substitution is available. The token
%i will be replaced with the current increment, zero-padded to 5 digits. The token
%e will be replaced with the elapsed time in seconds, zero-padded to 6 digits. The token
%r will be replaced with the reason for this update to the checkpoint files (either completely
exiting early due to exitat option, or a normal
checkpoint). Literal % characters may be inserted with
On Linux and OSX, prman will listen for the HUP and USR1 signals. If checkpointing was enabled for the render then when prman receives a HUP signal it will attempt to write a checkpoint and continue rendering. If it receives the USR1 it will write a checkpoint and then begin an orderly shutdown. This is similar to the behavior with the internal checkpoint interval and exitat parameters, but allows these to be externally triggered such as by a pipeline queuing system. The Windows operating system does not support these signals, so this feature is not available on that platform
While the most basic use for checkpointing and recovery is simply to be able to resume a render in case of a failure, checkpointing with incremental renders enables some powerful new workflows.