Skip to content

Jroge33/inverted-pendulum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inverted Pendulum: DDPG vs. PID

Semester project for CSC4444 (AI). A comparison of a learned reinforcement learning agent (DDPG) against a hand-tuned classical controller (PID) on the continuous inverted pendulum (CartPole) task.

Training curve

Overview

The goal is to keep a pole upright on a mobile cart by applying continuous horizontal force. Two controllers are implemented and evaluated head-to-head:

Controller Approach Action Space
DDPG Actor-critic deep RL, learns from interaction Continuous force in [−1, 1]
PID Two-level cascade control, hand-tuned gains Continuous force in [−1, 1]

The environment is a custom continuous-action variant of CartPole built on Gymnasium. An episode ends when the cart leaves bounds (|x| > 2.4) or the pole falls (|θ| > 12°), with a max of 500 steps.

Getting Started

Installation

git clone https://github.com/<your-username>/inverted-pendulum.git
cd inverted-pendulum
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e .

Project Structure

inverted-pendulum/
├── src/inverted_pendulum/
│   ├── agents/ddpg.py              # DDPG actor-critic agent
│   ├── controllers/pid.py          # Cascade PID controller
│   ├── envs/cartpole_continuous.py # Custom continuous CartPole environment
│   └── training/__init__.py        # Training loop
├── scripts/
│   ├── train_ddpg.py               # Train the DDPG agent
│   ├── evaluate_ddpg.py            # Evaluate DDPG (no exploration noise)
│   ├── watch_ddpg.py               # Watch DDPG agent live
│   ├── evaluate_pid.py             # Evaluate PID controller
│   └── watch_pid.py                # Watch PID controller live
└── results/                        # Saved models and plots
    ├── ddpg_cartpole.pt
    ├── ddpg_cartpole_best.pt
    └── ddpg_training_curve.png

Usage

All scripts must be run from the project root.

Train the DDPG agent

python scripts/train_ddpg.py

Trains for 500 episodes. Saves the final model to results/ddpg_cartpole.pt and the best checkpoint (by 20-episode moving average) to results/ddpg_cartpole_best.pt. A training curve is saved to results/ddpg_training_curve.png.

Evaluate

# DDPG agent (10 episodes, no exploration noise)
python scripts/evaluate_ddpg.py

# PID controller (10 episodes)
python scripts/evaluate_pid.py

Each script prints per-episode reward, steps survived, success/failure reason, mean absolute pole angle, mean absolute cart position, total squared action, and action smoothness. The summary reports averages plus the success rate and failure-reason counts.

Watch live

# DDPG
python scripts/watch_ddpg.py

# PID
python scripts/watch_pid.py

Renders the environment in real time using pygame.

Algorithm Details

DDPG (Deep Deterministic Policy Gradient)

  • Actor: 4 → 256 → 256 → 1 (tanh output for action in [−1, 1])
  • Critic: (state + action) → 256 → 256 → scalar Q-value
  • Exploration: Ornstein-Uhlenbeck noise, σ decayed linearly from 0.2 → 0.01
  • Replay buffer: 100,000 transitions, batch size 256
  • Soft target update: τ = 0.005
  • Hyperparameters: γ = 0.99, actor/critic LR = 1e-3, 2 updates per env step

PID (Cascade)

A two-level cascade controller:

  1. Position loop (Kp=0.08, Ki=0.0, Kd=0.4): cart position error → angle setpoint (clipped to ±0.15 rad)
  2. Angle loop (Kp=25.0, Ki=0.5, Kd=10.0): angle error → force command (clipped to ±1.0)

Integral windup is limited to ±0.5 in the angle loop.

Dependencies

  • Python ≥ 3.10
  • PyTorch ≥ 2.2
  • Gymnasium ≥ 1.0 (classic-control)
  • NumPy ≥ 1.26
  • Matplotlib ≥ 3.8

About

Semester project for csc4444g (AI). An inverted pendulum controlled by an RL agent compared to a traditional PID controller.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages