Summary
TorchRL could use a public, documented evaluator API for the common setup where:
- training runs on one device,
- collection runs on one or more other devices,
- evaluation runs on a dedicated device,
- evaluation should not block the training loop.
In the current installed build I tested, torchrl.trainers.Evaluator is not importable, and I couldn't find a public MultiAsyncCollector symbol either. That leaves users implementing custom background threads/processes around:
- creating a separate eval env,
- copying policy weights over,
- running deterministic rollout,
- handling logging/video manually,
- polling/joining results.
Concrete use case
I am training PPO with:
- collectors on
cuda:4,cuda:6,
- optimizer on
cuda:5,
- evaluation on
cuda:7.
The desired behavior is:
- trigger eval every N training iterations,
- keep the hot training loop running,
- poll the eval result later,
- log scalar metrics and optional video once the result is ready.
That pattern is useful enough that it would be better as a first-class TorchRL API than repeated custom code in downstream projects.
What would help
Something along these lines:
- a public
Evaluator (or similarly named) object that is part of the installed API,
- support for sync and async modes,
- explicit support for a dedicated eval device,
- simple
trigger(...), poll(), and wait() semantics,
- a clear contract for how policy weights are transferred,
- integration with
VideoRecorder / loggers, or at least a recommended pattern documented in TorchRL.
Why this matters
Without this, async eval tends to become a pile of downstream boilerplate that is easy to get subtly wrong:
- stale weights,
- blocking behavior,
- duplicate env setup,
- awkward video handling,
- ad hoc thread/process lifecycle management.
If there is already a recommended API for this that is just not exported/documented, exposing it would already help a lot.
Summary
TorchRL could use a public, documented evaluator API for the common setup where:
In the current installed build I tested,
torchrl.trainers.Evaluatoris not importable, and I couldn't find a publicMultiAsyncCollectorsymbol either. That leaves users implementing custom background threads/processes around:Concrete use case
I am training PPO with:
cuda:4,cuda:6,cuda:5,cuda:7.The desired behavior is:
That pattern is useful enough that it would be better as a first-class TorchRL API than repeated custom code in downstream projects.
What would help
Something along these lines:
Evaluator(or similarly named) object that is part of the installed API,trigger(...),poll(), andwait()semantics,VideoRecorder/ loggers, or at least a recommended pattern documented in TorchRL.Why this matters
Without this, async eval tends to become a pile of downstream boilerplate that is easy to get subtly wrong:
If there is already a recommended API for this that is just not exported/documented, exposing it would already help a lot.