Async rendering support #269

mpragnay · 2026-01-21T17:28:15Z

Async Render Support for preventing busy waiting on a render interval for too long(sometimes 10 minute timeout hit which is bad):

Async rendering mode, where there is no busy wait at a render step
Number of processes in render_queue limited by num_workers
Especially helpful with a good average GPU_util and no real change in SPS intervals, although jobs are completed faster, example:

render_async=false, interval=50. https://wandb.ai/emerge_/async_render_support/runs/1sqs53pe/
render_async=true, interval=50. https://wandb.ai/emerge_/async_render_support/runs/pg03q1jo/

Both jobs run for 100M steps with same configs and second one completes in 23 mins, whereas first one completes in 41 mins.

Conclusion for this mode:

Only use if you have a small render interval and/or rendering many carla maps each time

greptile-apps · 2026-01-28T22:20:29Z

Greptile Overview

Greptile Summary

This PR adds async rendering support to prevent busy-waiting at render intervals, allowing render jobs to run in background processes limited by num_workers. The implementation creates a multiprocessing queue for async communication and spawns render processes that put results back into the queue for later logging.

Key changes:

Added render_async config flag and multiprocessing infrastructure in pufferl.py
Modified render_videos() signature to support both sync and async modes with queue communication
Process management with proper cleanup in close() method
Queue checking in mean_and_log() to log completed async renders

Critical issues found:

Syntax error on line 660 - malformed comment causes entire code block to be commented out, breaking render process cleanup
File path bug - double .mp4 extension created for single map renders (output_agent_{epoch}.mp4.mp4)
KeyError risk - config["render_async"] accessed without checking if key exists (though it's now in config file)

Additional concerns:

10x geometry simplification threshold increase (25→250) should be verified for quality impact
Previous thread concerns about bin_path_epoch deletion timing remain valid - async processes may still need the file when it's deleted in the finally block

Confidence Score: 1/5

Critical syntax error will break render cleanup; should not merge until fixed
Line 660 has a malformed comment that turns the entire render process cleanup block into a comment, meaning async render processes will never be terminated properly. This is a critical bug that will cause resource leaks.
pufferlib/pufferl.py (line 660 syntax error), pufferlib/utils.py (double .mp4 extension bug)

Important Files Changed

Filename	Overview
pufferlib/utils.py	Async rendering support added with file path bug causing double .mp4 extension
pufferlib/pufferl.py	Async render process management added with queue communication
pufferlib/ocean/drive/drive.py	Fixed variable references and increased geometry simplification threshold

Sequence Diagram

sequenceDiagram
    participant Main as Main Process
    participant PRL as PuffeRL
    participant RProc as Render Process
    participant Queue as Render Queue
    participant WandB as WandB Logger

    Main->>PRL: Initialize with config
    alt render_async enabled
        PRL->>Queue: Create multiprocessing.Queue()
        PRL->>PRL: Initialize render_processes list
    end

    loop Training Loop
        PRL->>PRL: Train and checkpoint
        alt At render_interval
            PRL->>PRL: Export model to bin_path
            PRL->>PRL: Copy bin_path → bin_path_epoch
            
            alt render_async mode
                PRL->>PRL: Clean up finished processes
                PRL->>PRL: Wait if >= num_workers processes
                PRL->>RProc: Start Process(render_videos, bin_path_epoch)
                RProc->>RProc: Generate videos (async)
                RProc->>Queue: Put {videos, step}
                RProc->>RProc: Delete bin_path_epoch
            else synchronous mode
                PRL->>PRL: Call render_videos(bin_path_epoch)
                PRL->>WandB: Log videos immediately
                PRL->>PRL: Delete bin_path_epoch
            end
        end
        
        PRL->>PRL: check_render_queue()
        alt render_async and queue not empty
            PRL->>Queue: Get result (non-blocking)
            PRL->>WandB: Log videos at render_step
        end
        
        PRL->>PRL: Continue training
    end

    Main->>PRL: close()
    alt render_async enabled
        PRL->>RProc: Terminate all processes
        PRL->>Queue: Close and join_thread()
    end

greptile-apps

_{3 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

pufferlib/config/ocean/drive.ini

pufferlib/pufferl.py

pufferlib/utils.py

Copilot

Pull request overview

This pull request adds asynchronous rendering support to prevent busy waiting during video generation. The main changes involve spawning render operations in separate processes using Python's multiprocessing module, allowing training to continue while videos are being rendered.

Changes:

Added async rendering mode with process-based parallelism to avoid blocking training during video generation
Refactored render_videos function signature to decouple from logger/vecenv dependencies
Fixed bug in drive.py where num_maps variable was used before being defined
Removed debug print statements from CARLA data generation utilities

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
pufferlib/utils.py	Refactored render_videos function to support async mode with queue-based communication and updated video file handling
pufferlib/pufferl.py	Added multiprocessing infrastructure for async rendering including queue management, process spawning, and result collection
pufferlib/ocean/drive/drive.py	Fixed bug using self.num_maps instead of undefined num_maps variable; changed simplify_polyline tolerance
pufferlib/config/ocean/drive.ini	Updated configuration with async rendering settings and modified training parameters
data_utils/carla/generate_carla_agents.py	Removed debug print statements

Comments suppressed due to low confidence (1)

pufferlib/utils.py:186

The docstring is outdated and still describes the old function signature with vecenv and logger parameters. It should be updated to reflect the new parameters: env_cfg, run_id, wandb_log, render_async, render_queue, and wandb_run. The current documentation will mislead users about how to call this function.

    """
    Generate and log training videos using C-based rendering.

    Args:
        config: Configuration dictionary containing data_dir, env, and render settings
        vecenv: Vectorized environment with driver_env attribute
        logger: Logger object with run_id and optional wandb attribute
        epoch: Current training epoch
        global_step: Current global training step
        bin_path: Path to the exported .bin model weights file

    Returns:
        None. Prints error messages if rendering fails.
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pufferlib/pufferl.py

Copilot · 2026-01-28T22:24:25Z

pufferlib/pufferl.py

+                            self.render_processes = [p for p in self.render_processes if p.is_alive()]
+
+                            # Cap the number of processes to num_workers
+                            max_processes = self.config.get("num_workers")


If num_workers is not set in the config, config.get("num_workers") will return None, and the comparison len(self.render_processes) >= max_processes will fail with a TypeError. Consider providing a default value such as config.get("num_workers", 1) or validating that num_workers is set when render_async is enabled.

Suggested change

max_processes = self.config.get("num_workers")

max_processes = self.config.get("num_workers", 1)

pufferlib/utils.py

pufferlib/pufferl.py

pufferlib/config/ocean/drive.ini

pufferlib/pufferl.py

greptile-apps

_{3 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

pufferlib/utils.py

pufferlib/pufferl.py

pufferlib/ocean/drive/drive.py

…k. Added render_queue for async rendering and logging videos

…ess than 50

mpragnay marked this pull request as ready for review January 28, 2026 22:16

Copilot AI review requested due to automatic review settings January 28, 2026 22:16

Copilot started reviewing on behalf of mpragnay January 28, 2026 22:17 View session

mpragnay assigned riccardosavorgnan Jan 28, 2026

greptile-apps bot reviewed Jan 28, 2026

View reviewed changes

pufferlib/config/ocean/drive.ini Outdated Show resolved Hide resolved

pufferlib/pufferl.py Show resolved Hide resolved

pufferlib/pufferl.py Outdated Show resolved Hide resolved

pufferlib/pufferl.py Outdated Show resolved Hide resolved

pufferlib/utils.py Show resolved Hide resolved

Copilot AI reviewed Jan 28, 2026

View reviewed changes

mpragnay marked this pull request as draft January 28, 2026 22:50

mpragnay marked this pull request as ready for review January 29, 2026 16:35

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

pufferlib/utils.py Outdated Show resolved Hide resolved

pufferlib/pufferl.py Outdated Show resolved Hide resolved

pufferlib/pufferl.py Show resolved Hide resolved

pufferlib/pufferl.py Show resolved Hide resolved

pufferlib/ocean/drive/drive.py Outdated Show resolved Hide resolved

mpragnay force-pushed the Async_Rendering_Support branch from c16efbd to 710601a Compare January 29, 2026 18:20

mpragnay added 9 commits January 29, 2026 14:50

tmp async render support

dd94701

Fixed to serializable render_videos for python multiprocessing to wor…

f8349dc

…k. Added render_queue for async rendering and logging videos

Monotonous wandb step fix by using a custom render_step

039b966

binary and video files naming convention fixes for render intervals l…

b02e565

…ess than 50

minor changes

55684d3

limit renders by num_workers

8de7f30

code cleanup

d81f3b3

pre-commit fixes

84e8f2e

Code cleanup

dd1abac

mpragnay force-pushed the Async_Rendering_Support branch from 710601a to dd1abac Compare January 29, 2026 19:51

	max_processes = self.config.get("num_workers")
	max_processes = self.config.get("num_workers", 1)

Async rendering support #269

Are you sure you want to change the base?

Async rendering support #269

Uh oh!

Conversation

mpragnay commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mpragnay commented Jan 21, 2026 •

edited

Loading

greptile-apps bot commented Jan 28, 2026 •

edited

Loading