Semester project for CSC4444 (AI). A comparison of a learned reinforcement learning agent (DDPG) against a hand-tuned classical controller (PID) on the continuous inverted pendulum (CartPole) task.
The goal is to keep a pole upright on a mobile cart by applying continuous horizontal force. Two controllers are implemented and evaluated head-to-head:
| Controller | Approach | Action Space |
|---|---|---|
| DDPG | Actor-critic deep RL, learns from interaction | Continuous force in [−1, 1] |
| PID | Two-level cascade control, hand-tuned gains | Continuous force in [−1, 1] |
The environment is a custom continuous-action variant of CartPole built on Gymnasium. An episode ends when the cart leaves bounds (|x| > 2.4) or the pole falls (|θ| > 12°), with a max of 500 steps.
git clone https://github.com/<your-username>/inverted-pendulum.git
cd inverted-pendulum
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .inverted-pendulum/
├── src/inverted_pendulum/
│ ├── agents/ddpg.py # DDPG actor-critic agent
│ ├── controllers/pid.py # Cascade PID controller
│ ├── envs/cartpole_continuous.py # Custom continuous CartPole environment
│ └── training/__init__.py # Training loop
├── scripts/
│ ├── train_ddpg.py # Train the DDPG agent
│ ├── evaluate_ddpg.py # Evaluate DDPG (no exploration noise)
│ ├── watch_ddpg.py # Watch DDPG agent live
│ ├── evaluate_pid.py # Evaluate PID controller
│ └── watch_pid.py # Watch PID controller live
└── results/ # Saved models and plots
├── ddpg_cartpole.pt
├── ddpg_cartpole_best.pt
└── ddpg_training_curve.png
All scripts must be run from the project root.
python scripts/train_ddpg.pyTrains for 500 episodes. Saves the final model to results/ddpg_cartpole.pt and the best checkpoint (by 20-episode moving average) to results/ddpg_cartpole_best.pt. A training curve is saved to results/ddpg_training_curve.png.
# DDPG agent (10 episodes, no exploration noise)
python scripts/evaluate_ddpg.py
# PID controller (10 episodes)
python scripts/evaluate_pid.pyEach script prints per-episode reward, steps survived, success/failure reason, mean absolute pole angle, mean absolute cart position, total squared action, and action smoothness. The summary reports averages plus the success rate and failure-reason counts.
# DDPG
python scripts/watch_ddpg.py
# PID
python scripts/watch_pid.pyRenders the environment in real time using pygame.
- Actor: 4 → 256 → 256 → 1 (tanh output for action in [−1, 1])
- Critic: (state + action) → 256 → 256 → scalar Q-value
- Exploration: Ornstein-Uhlenbeck noise, σ decayed linearly from 0.2 → 0.01
- Replay buffer: 100,000 transitions, batch size 256
- Soft target update: τ = 0.005
- Hyperparameters: γ = 0.99, actor/critic LR = 1e-3, 2 updates per env step
A two-level cascade controller:
- Position loop (Kp=0.08, Ki=0.0, Kd=0.4): cart position error → angle setpoint (clipped to ±0.15 rad)
- Angle loop (Kp=25.0, Ki=0.5, Kd=10.0): angle error → force command (clipped to ±1.0)
Integral windup is limited to ±0.5 in the angle loop.
