Inverted Pendulum: DDPG vs. PID

Semester project for CSC4444 (AI). A comparison of a learned reinforcement learning agent (DDPG) against a hand-tuned classical controller (PID) on the continuous inverted pendulum (CartPole) task.

Overview

The goal is to keep a pole upright on a mobile cart by applying continuous horizontal force. Two controllers are implemented and evaluated head-to-head:

Controller	Approach	Action Space
DDPG	Actor-critic deep RL, learns from interaction	Continuous force in [−1, 1]
PID	Two-level cascade control, hand-tuned gains	Continuous force in [−1, 1]

The environment is a custom continuous-action variant of CartPole built on Gymnasium. An episode ends when the cart leaves bounds (|x| > 2.4) or the pole falls (|θ| > 12°), with a max of 500 steps.

Getting Started

Installation

git clone https://github.com/<your-username>/inverted-pendulum.git
cd inverted-pendulum
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e .

Project Structure

inverted-pendulum/
├── src/inverted_pendulum/
│   ├── agents/ddpg.py              # DDPG actor-critic agent
│   ├── controllers/pid.py          # Cascade PID controller
│   ├── envs/cartpole_continuous.py # Custom continuous CartPole environment
│   └── training/__init__.py        # Training loop
├── scripts/
│   ├── train_ddpg.py               # Train the DDPG agent
│   ├── evaluate_ddpg.py            # Evaluate DDPG (no exploration noise)
│   ├── watch_ddpg.py               # Watch DDPG agent live
│   ├── evaluate_pid.py             # Evaluate PID controller
│   └── watch_pid.py                # Watch PID controller live
└── results/                        # Saved models and plots
    ├── ddpg_cartpole.pt
    ├── ddpg_cartpole_best.pt
    └── ddpg_training_curve.png

Usage

All scripts must be run from the project root.

Train the DDPG agent

python scripts/train_ddpg.py

Trains for 500 episodes. Saves the final model to results/ddpg_cartpole.pt and the best checkpoint (by 20-episode moving average) to results/ddpg_cartpole_best.pt. A training curve is saved to results/ddpg_training_curve.png.

Evaluate

# DDPG agent (10 episodes, no exploration noise)
python scripts/evaluate_ddpg.py

# PID controller (10 episodes)
python scripts/evaluate_pid.py

Each script prints per-episode reward, steps survived, success/failure reason, mean absolute pole angle, mean absolute cart position, total squared action, and action smoothness. The summary reports averages plus the success rate and failure-reason counts.

Watch live

# DDPG
python scripts/watch_ddpg.py

# PID
python scripts/watch_pid.py

Renders the environment in real time using pygame.

Algorithm Details

DDPG (Deep Deterministic Policy Gradient)

Actor: 4 → 256 → 256 → 1 (tanh output for action in [−1, 1])
Critic: (state + action) → 256 → 256 → scalar Q-value
Exploration: Ornstein-Uhlenbeck noise, σ decayed linearly from 0.2 → 0.01
Replay buffer: 100,000 transitions, batch size 256
Soft target update: τ = 0.005
Hyperparameters: γ = 0.99, actor/critic LR = 1e-3, 2 updates per env step

PID (Cascade)

A two-level cascade controller:

Position loop (Kp=0.08, Ki=0.0, Kd=0.4): cart position error → angle setpoint (clipped to ±0.15 rad)
Angle loop (Kp=25.0, Ki=0.5, Kd=10.0): angle error → force command (clipped to ±1.0)

Integral windup is limited to ±0.5 in the angle loop.

Dependencies

Python ≥ 3.10
PyTorch ≥ 2.2
Gymnasium ≥ 1.0 (classic-control)
NumPy ≥ 1.26
Matplotlib ≥ 3.8

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
results		results
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inverted Pendulum: DDPG vs. PID

Overview

Getting Started

Installation

Project Structure

Usage

Train the DDPG agent

Evaluate

Watch live

Algorithm Details

DDPG (Deep Deterministic Policy Gradient)

PID (Cascade)

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inverted Pendulum: DDPG vs. PID

Overview

Getting Started

Installation

Project Structure

Usage

Train the DDPG agent

Evaluate

Watch live

Algorithm Details

DDPG (Deep Deterministic Policy Gradient)

PID (Cascade)

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages