A compact causal-safety reinforcement learning benchmark for hidden-rule inference, sparse rewards, proxy traps, and alignment-gap measurement.
benchmark reinforcement-learning deep-reinforcement-learning q-learning dqn alignment gridworld gymnasium ai-safety pomdp double-dqn dueling-dqn intrinsic-motivation prioritized-experience-replay noisy-networks safe-rl reward-hacking causal-rl dqfd spie-q
-
Updated
Jun 17, 2026 - Python