Quadruped Locomotion — CPG & Deep Reinforcement Learning

EPFL Legged Robots course project (Nov 2022 – Jan 2023), completed as part of a two-person team in the BioRob lab (Prof. Auke Ijspeert). The project implements and compares two fundamentally different control strategies for a simulated 12.45 kg quadruped (Unitree A1, PyBullet):

Central Pattern Generator (CPG) — a bio-inspired oscillator network with analytical PD control
Deep Reinforcement Learning (DRL) — a CPG-augmented policy trained end-to-end with PPO and SAC

Results: the CPG approach achieves interpretable, gait-specific locomotion (best CoT = 0.459 with Pace); the DRL policy surpasses it (CoT = 0.300) after ~400 k training steps, at the cost of a black-box policy and long training time.

Repository contents

File	Description
`hopf_network.py`	CPG oscillator network — Hopf polar equations, phase coupling, gait matrices, RL modulation interface
`run_cpg.py`	Runs the CPG controller open-loop in PyBullet; records CPG states, foot trajectories, and base velocity
`quadruped_gym_env.py`	Custom OpenAI Gym environment wrapping the PyBullet simulation; defines observation/action spaces and reward functions
`run_sb3.py`	Trains a PPO or SAC policy with Stable-Baselines3
`load_sb3.py`	Loads a trained policy and runs evaluation rollouts

Simulation environment not included. This code depends on course-provided infrastructure: the quadruped module, configs_a1, robot URDF files, and utility libraries (utils/). These are not redistributable. The files here are the components authored during the project.

Part 1 — Central Pattern Generator

Hopf oscillator network

The CPG is modelled as four coupled non-linear oscillators (one per leg) operating in polar coordinates. Each oscillator i has amplitude rᵢ and phase θᵢ, governed by:

Amplitude (converges to √μ via a limit cycle):

ṙᵢ = α(μ − rᵢ²)rᵢ

Phase (with inter-leg coupling):

θ̇ᵢ = ωᵢ + Σⱼ rⱼ w sin(θⱼ − θᵢ − φᵢⱼ)

Parameters:

α = 50 — amplitude convergence rate
μ = 1 (non-RL) — intrinsic amplitude setpoint; oscillator converges to r = √μ = 1
w — coupling strength (default 1)
φᵢⱼ — desired inter-leg phase offsets (gait-specific, see below)

The natural frequency ωᵢ switches between two regimes based on the current phase:

Swing (0 ≤ θ ≤ π): foot lifting and advancing — ω_swing = 12π rad/s (default in run_cpg.py)
Stance (π < θ ≤ 2π): foot grounded, pushing — ω_stance = 4π rad/s

Integration uses first-order Euler in the open-loop CPG and trapezoidal (second-order) in the RL variant.

CPG → foot position mapping

CPG states are projected to Cartesian foot targets in the leg xz-plane:

x_foot = −d_step · r · cos(θ)
z_foot = −h + g_c · sin(θ)   if sin(θ) > 0   (swing: foot rises)
z_foot = −h + g_p · sin(θ)   otherwise         (stance: foot presses)

Parameter	Value	Meaning
`d_step`	0.05 m	Desired step length
`h`	0.30 m	Nominal robot height
`g_c`	0.07 m	Ground clearance during swing
`g_p`	0.01 m	Ground penetration depth during stance

Gait coupling matrices

Four gaits are implemented by choosing the inter-leg phase offset matrix φ (leg order: FR, FL, RR, RL):

Gait	Coordination	v_avg (fast)	CoT (fast)
Trot	Diagonal pairs in phase (φ = 0.5·2π)	0.612 m/s	0.543
Lateral walk	Sequential lateral (φ offsets: 0, 0.5, 0.25, 0.75)	1.434 m/s	0.756
Bound	Front/rear pairs in phase	0.626 m/s	0.842
Pace	Ipsilateral pairs in phase	1.376 m/s	0.459

Pace achieves the lowest Cost of Transport (0.459). Bound is the least efficient (0.842).

Combined PD control law

Foot position targets from the CPG are tracked by a combined Joint + Cartesian PD controller:

τ_total = τ_joint + τ_cartesian

τ_joint     = Kp_j · (q_des − q) + Kd_j · (0 − q̇)
τ_cartesian = Jᵀ · [Kp_c · (p_des − p) − Kd_c · v_foot]

where J is the leg Jacobian, p is the current foot position, and v_foot = J q̇.

Key finding: Joint PD alone and Cartesian PD alone both produce unstable locomotion. Only the combined controller with tuned gains produces stable gaits.

Tuned gains (trot):

Mode	Kp_joint	Kd_joint	Kp_cartesian	Kd_cartesian
Joint + Cartesian PD	100	2	500	20
Joint PD only	200	2	—	—
Cartesian PD only	—	—	5000	80

Part 2 — Deep Reinforcement Learning

MDP formulation

Observation space (42-dimensional, LR_COURSE_OBS + CPG mode):

Component	Dim	Source
Motor angles	12	Joint encoders
Motor velocities	12	Joint encoders
Base orientation (quaternion)	4	IMU
Base linear velocity	3	IMU
Base angular velocity	3	IMU
CPG amplitudes r	4	CPG state
CPG phases θ	4	CPG state

Action space (CPG mode, 8-dimensional): the policy outputs modulation signals for CPG amplitude setpoints (μ) and oscillator frequencies (ω) for each of the 4 legs, scaled from [−1, 1]:

ω ∈ [5, 4.5·2π] rad/s
μ ∈ [MU_LOW², MU_UPP²] = [1, 4]

These drive the CPG, which maps to foot positions via IK, which are tracked by Joint PD — inheriting the structured motion priors of the oscillator network.

Episode length: 10 s (10 000 simulation steps at dt = 0.001 s, action_repeat = 10).

Reward function

The reward combines forward locomotion tracking with additional shaping terms:

Term	Weight	Purpose
vel_tracking	+0.05	Gaussian reward for matching 1 m/s forward velocity
yaw penalty	−0.2	Penalise heading drift
drift penalty	−0.01	Penalise lateral position drift
energy penalty	−0.01	Minimise motor power (τ · q̇)
orientation penalty	−0.1	Keep base level (quaternion deviation)
vertical velocity penalty	−0.1/5	Penalise vertical oscillation
yaw rate penalty	−0.05/100	Penalise rearing/rolling
joint velocity penalty	−0.001/150	Smooth limb movement
roll penalty	−0.2/10	Limit roll oscillations
torque penalty	−0.00002/20	Prevent unrealistic torques

All rewards are clipped to ≥ 0.

Training setup

# run_sb3.py key hyperparameters (PPO)
policy_kwargs = dict(net_arch=[256, 256])   # two-layer MLP
n_steps       = 4096
learning_rate = 1e-4
gamma         = 0.99
gae_lambda    = 0.95
batch_size    = 128
n_epochs      = 10
clip_range    = 0.2
total_timesteps = 1_000_000

Observations are normalised online via VecNormalize. Checkpoints are saved every 30 000 steps.

Results

Policy	v_avg	CoT
CPG-RL PPO (fast)	0.944 m/s	0.300
CPG-RL PPO (slow)	0.491 m/s	0.447

Action space ranking: CPG-RL >> Cartesian PD >> Joint PD. Pure joint or Cartesian PD action spaces fail to converge to stable locomotion. PPO converges in ~200 k steps (episode length) and ~400 k steps (reward). SAC was slower to converge and less stable.

The fast PPO policy (CoT = 0.300) beats every handcrafted CPG gait including Pace (CoT = 0.459).

Dependencies

python >= 3.8
numpy
pybullet
gym
stable-baselines3
matplotlib

The course infrastructure (quadruped, configs_a1, URDF assets, utils/) must be available on the Python path. These are not included in this repository.

Running

CPG controller:

# Edit gait, omega_swing, omega_stance in run_cpg.py, then:
python run_cpg.py

Train RL policy:

# Edit LEARNING_ALG, env_configs in run_sb3.py, then:
python run_sb3.py
# Checkpoints saved to ./logs/intermediate_models/<timestamp>/

Evaluate trained policy:

# Set log_dir in load_sb3.py to the checkpoint directory, then:
python load_sb3.py

Portfolio note — changes from original submission

The five Python files are functionally identical to the course submission. The following cosmetic/presentability changes were made for this portfolio repo:

hopf_network.py
- Removed an orphaned dead-code expression at the end of _integrate_hopf_equations_rl — a bare conditional whose result was silently discarded (copy-paste artifact from the non-RL method).
- Removed the unused X_dot_prev variable from _integrate_hopf_equations — the Euler method does not use it; only the trapezoidal RL variant correctly uses it.
- Replaced the _set_gait docstring [TODO] update all coupling matrices with an accurate description (all four gait matrices were already implemented).
- Removed inline # [TODO] course-scaffold markers on completed code lines throughout (update(), _integrate_hopf_equations, _integrate_hopf_equations_rl), replacing them with descriptive comments.
run_cpg.py
- Fixed a wrong inline comment: TEST_STEPS = int(5 / TIME_STEP) #10 seconds — 5/0.001 = 5000 steps = 5 seconds. Corrected to # 5 seconds.
- Removed # [TODO] markers on all completed code lines in the simulation loop, replacing them with descriptive comments.
- The plotting section (lines ~165–260) is preserved as commented-out code. These blocks were the analysis scripts used to generate the CPG state plots, foot-position tracking, joint angle tracking, base velocity, duty cycle, and Cost of Transport figures reported in the paper. They are kept as reference; they can be uncommented to reproduce those figures.
quadruped_gym_env.py
- Removed #SEE LECTURE 7 SLIDE 95 — an internal course reference.
- Rewrote the observation-space mode comment block from assignment-prompt style ([TODO: what should you include?]) to a factual description of what was implemented.
- Replaced the _reward_lr_course docstring (original assignment prompt) with a description of what the function actually does.
- Replaced the ScaleActionToCartesianPos docstring (written as an editing instruction) with a description of the method's behaviour.
- Removed inline # [TODO] markers from ScaleActionToCartesianPos and ScaleActionToCPGStateModulations where code was already in place.
- Removed a stray #Anything else to add ? scaffold comment from the non-CPG observation space upper bound.
run_sb3.py
- Replaced # normalize observations to stabilize learning (why?) — a rhetorical course-assignment question — with a factual description of what VecNormalize does.
load_sb3.py
- Replaced a hardcoded training-run timestamp ('010423183034') with an empty string placeholder and a clarifying comment.
- Removed two internal bookkeeping comments naming specific checkpoint directories by timestamp.
- Removed # [TODO] scaffold markers from the evaluation loop (array initialisation, per-step data saving, plot section header).
- Removed a stale inline comment on model.predict() (# sample at test time? ([TODO]: test)).
- The data-collection arrays and plotting blocks (lines ~99–240) are preserved as commented-out code. These were the scripts used to produce the base-position, leg-contact, CPG amplitude/phase, base-velocity, duty-cycle, and Cost of Transport figures in the paper. They are kept as reference; they can be uncommented to reproduce those plots.

No algorithmic logic was altered.

Reference

CPG architecture based on:

Bellegarda, G. & Ijspeert, A. (2022). CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion. IEEE Robotics and Automation Letters. https://ieeexplore.ieee.org/abstract/document/9932888

Course skeleton and robot environment by Guillaume Bellegarda, EPFL BioRob (BSD-3-Clause).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quadruped Locomotion — CPG & Deep Reinforcement Learning

Repository contents

Part 1 — Central Pattern Generator

Hopf oscillator network

CPG → foot position mapping

Gait coupling matrices

Combined PD control law

Part 2 — Deep Reinforcement Learning

MDP formulation

Reward function

Training setup

Results

Dependencies

Running

Portfolio note — changes from original submission

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
hopf_network.py		hopf_network.py
load_sb3.py		load_sb3.py
quadruped_gym_env.py		quadruped_gym_env.py
run_cpg.py		run_cpg.py
run_sb3.py		run_sb3.py

Folders and files

Latest commit

History

Repository files navigation

Quadruped Locomotion — CPG & Deep Reinforcement Learning

Repository contents

Part 1 — Central Pattern Generator

Hopf oscillator network

CPG → foot position mapping

Gait coupling matrices

Combined PD control law

Part 2 — Deep Reinforcement Learning

MDP formulation

Reward function

Training setup

Results

Dependencies

Running

Portfolio note — changes from original submission

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages