EPFL Legged Robots course project (Nov 2022 – Jan 2023), completed as part of a two-person team in the BioRob lab (Prof. Auke Ijspeert). The project implements and compares two fundamentally different control strategies for a simulated 12.45 kg quadruped (Unitree A1, PyBullet):
- Central Pattern Generator (CPG) — a bio-inspired oscillator network with analytical PD control
- Deep Reinforcement Learning (DRL) — a CPG-augmented policy trained end-to-end with PPO and SAC
Results: the CPG approach achieves interpretable, gait-specific locomotion (best CoT = 0.459 with Pace); the DRL policy surpasses it (CoT = 0.300) after ~400 k training steps, at the cost of a black-box policy and long training time.
| File | Description |
|---|---|
hopf_network.py |
CPG oscillator network — Hopf polar equations, phase coupling, gait matrices, RL modulation interface |
run_cpg.py |
Runs the CPG controller open-loop in PyBullet; records CPG states, foot trajectories, and base velocity |
quadruped_gym_env.py |
Custom OpenAI Gym environment wrapping the PyBullet simulation; defines observation/action spaces and reward functions |
run_sb3.py |
Trains a PPO or SAC policy with Stable-Baselines3 |
load_sb3.py |
Loads a trained policy and runs evaluation rollouts |
Simulation environment not included. This code depends on course-provided infrastructure: the
quadrupedmodule,configs_a1, robot URDF files, and utility libraries (utils/). These are not redistributable. The files here are the components authored during the project.
The CPG is modelled as four coupled non-linear oscillators (one per leg) operating in polar coordinates. Each oscillator i has amplitude rᵢ and phase θᵢ, governed by:
Amplitude (converges to √μ via a limit cycle):
ṙᵢ = α(μ − rᵢ²)rᵢ
Phase (with inter-leg coupling):
θ̇ᵢ = ωᵢ + Σⱼ rⱼ w sin(θⱼ − θᵢ − φᵢⱼ)
Parameters:
α = 50— amplitude convergence rateμ = 1(non-RL) — intrinsic amplitude setpoint; oscillator converges tor = √μ = 1w— coupling strength (default 1)φᵢⱼ— desired inter-leg phase offsets (gait-specific, see below)
The natural frequency ωᵢ switches between two regimes based on the current phase:
- Swing (0 ≤ θ ≤ π): foot lifting and advancing —
ω_swing = 12πrad/s (default inrun_cpg.py) - Stance (π < θ ≤ 2π): foot grounded, pushing —
ω_stance = 4πrad/s
Integration uses first-order Euler in the open-loop CPG and trapezoidal (second-order) in the RL variant.
CPG states are projected to Cartesian foot targets in the leg xz-plane:
x_foot = −d_step · r · cos(θ)
z_foot = −h + g_c · sin(θ) if sin(θ) > 0 (swing: foot rises)
z_foot = −h + g_p · sin(θ) otherwise (stance: foot presses)
| Parameter | Value | Meaning |
|---|---|---|
d_step |
0.05 m | Desired step length |
h |
0.30 m | Nominal robot height |
g_c |
0.07 m | Ground clearance during swing |
g_p |
0.01 m | Ground penetration depth during stance |
Four gaits are implemented by choosing the inter-leg phase offset matrix φ (leg order: FR, FL, RR, RL):
| Gait | Coordination | v_avg (fast) | CoT (fast) |
|---|---|---|---|
| Trot | Diagonal pairs in phase (φ = 0.5·2π) | 0.612 m/s | 0.543 |
| Lateral walk | Sequential lateral (φ offsets: 0, 0.5, 0.25, 0.75) | 1.434 m/s | 0.756 |
| Bound | Front/rear pairs in phase | 0.626 m/s | 0.842 |
| Pace | Ipsilateral pairs in phase | 1.376 m/s | 0.459 |
Pace achieves the lowest Cost of Transport (0.459). Bound is the least efficient (0.842).
Foot position targets from the CPG are tracked by a combined Joint + Cartesian PD controller:
τ_total = τ_joint + τ_cartesian
τ_joint = Kp_j · (q_des − q) + Kd_j · (0 − q̇)
τ_cartesian = Jᵀ · [Kp_c · (p_des − p) − Kd_c · v_foot]
where J is the leg Jacobian, p is the current foot position, and v_foot = J q̇.
Key finding: Joint PD alone and Cartesian PD alone both produce unstable locomotion. Only the combined controller with tuned gains produces stable gaits.
Tuned gains (trot):
| Mode | Kp_joint | Kd_joint | Kp_cartesian | Kd_cartesian |
|---|---|---|---|---|
| Joint + Cartesian PD | 100 | 2 | 500 | 20 |
| Joint PD only | 200 | 2 | — | — |
| Cartesian PD only | — | — | 5000 | 80 |
Observation space (42-dimensional, LR_COURSE_OBS + CPG mode):
| Component | Dim | Source |
|---|---|---|
| Motor angles | 12 | Joint encoders |
| Motor velocities | 12 | Joint encoders |
| Base orientation (quaternion) | 4 | IMU |
| Base linear velocity | 3 | IMU |
| Base angular velocity | 3 | IMU |
| CPG amplitudes r | 4 | CPG state |
| CPG phases θ | 4 | CPG state |
Action space (CPG mode, 8-dimensional): the policy outputs modulation signals for CPG amplitude setpoints (μ) and oscillator frequencies (ω) for each of the 4 legs, scaled from [−1, 1]:
ω ∈ [5, 4.5·2π]rad/sμ ∈ [MU_LOW², MU_UPP²]= [1, 4]
These drive the CPG, which maps to foot positions via IK, which are tracked by Joint PD — inheriting the structured motion priors of the oscillator network.
Episode length: 10 s (10 000 simulation steps at dt = 0.001 s, action_repeat = 10).
The reward combines forward locomotion tracking with additional shaping terms:
| Term | Weight | Purpose |
|---|---|---|
| vel_tracking | +0.05 | Gaussian reward for matching 1 m/s forward velocity |
| yaw penalty | −0.2 | Penalise heading drift |
| drift penalty | −0.01 | Penalise lateral position drift |
| energy penalty | −0.01 | Minimise motor power (τ · q̇) |
| orientation penalty | −0.1 | Keep base level (quaternion deviation) |
| vertical velocity penalty | −0.1/5 | Penalise vertical oscillation |
| yaw rate penalty | −0.05/100 | Penalise rearing/rolling |
| joint velocity penalty | −0.001/150 | Smooth limb movement |
| roll penalty | −0.2/10 | Limit roll oscillations |
| torque penalty | −0.00002/20 | Prevent unrealistic torques |
All rewards are clipped to ≥ 0.
# run_sb3.py key hyperparameters (PPO)
policy_kwargs = dict(net_arch=[256, 256]) # two-layer MLP
n_steps = 4096
learning_rate = 1e-4
gamma = 0.99
gae_lambda = 0.95
batch_size = 128
n_epochs = 10
clip_range = 0.2
total_timesteps = 1_000_000Observations are normalised online via VecNormalize. Checkpoints are saved every 30 000 steps.
| Policy | v_avg | CoT |
|---|---|---|
| CPG-RL PPO (fast) | 0.944 m/s | 0.300 |
| CPG-RL PPO (slow) | 0.491 m/s | 0.447 |
Action space ranking: CPG-RL >> Cartesian PD >> Joint PD. Pure joint or Cartesian PD action spaces fail to converge to stable locomotion. PPO converges in ~200 k steps (episode length) and ~400 k steps (reward). SAC was slower to converge and less stable.
The fast PPO policy (CoT = 0.300) beats every handcrafted CPG gait including Pace (CoT = 0.459).
python >= 3.8
numpy
pybullet
gym
stable-baselines3
matplotlib
The course infrastructure (
quadruped,configs_a1, URDF assets,utils/) must be available on the Python path. These are not included in this repository.
CPG controller:
# Edit gait, omega_swing, omega_stance in run_cpg.py, then:
python run_cpg.pyTrain RL policy:
# Edit LEARNING_ALG, env_configs in run_sb3.py, then:
python run_sb3.py
# Checkpoints saved to ./logs/intermediate_models/<timestamp>/Evaluate trained policy:
# Set log_dir in load_sb3.py to the checkpoint directory, then:
python load_sb3.pyThe five Python files are functionally identical to the course submission. The following cosmetic/presentability changes were made for this portfolio repo:
-
hopf_network.py- Removed an orphaned dead-code expression at the end of
_integrate_hopf_equations_rl— a bare conditional whose result was silently discarded (copy-paste artifact from the non-RL method). - Removed the unused
X_dot_prevvariable from_integrate_hopf_equations— the Euler method does not use it; only the trapezoidal RL variant correctly uses it. - Replaced the
_set_gaitdocstring[TODO] update all coupling matriceswith an accurate description (all four gait matrices were already implemented). - Removed inline
# [TODO]course-scaffold markers on completed code lines throughout (update(),_integrate_hopf_equations,_integrate_hopf_equations_rl), replacing them with descriptive comments.
- Removed an orphaned dead-code expression at the end of
-
run_cpg.py- Fixed a wrong inline comment:
TEST_STEPS = int(5 / TIME_STEP) #10 seconds— 5/0.001 = 5000 steps = 5 seconds. Corrected to# 5 seconds. - Removed
# [TODO]markers on all completed code lines in the simulation loop, replacing them with descriptive comments. - The plotting section (lines ~165–260) is preserved as commented-out code. These blocks were the analysis scripts used to generate the CPG state plots, foot-position tracking, joint angle tracking, base velocity, duty cycle, and Cost of Transport figures reported in the paper. They are kept as reference; they can be uncommented to reproduce those figures.
- Fixed a wrong inline comment:
-
quadruped_gym_env.py- Removed
#SEE LECTURE 7 SLIDE 95— an internal course reference. - Rewrote the observation-space mode comment block from assignment-prompt style (
[TODO: what should you include?]) to a factual description of what was implemented. - Replaced the
_reward_lr_coursedocstring (original assignment prompt) with a description of what the function actually does. - Replaced the
ScaleActionToCartesianPosdocstring (written as an editing instruction) with a description of the method's behaviour. - Removed inline
# [TODO]markers fromScaleActionToCartesianPosandScaleActionToCPGStateModulationswhere code was already in place. - Removed a stray
#Anything else to add ?scaffold comment from the non-CPG observation space upper bound.
- Removed
-
run_sb3.py- Replaced
# normalize observations to stabilize learning (why?)— a rhetorical course-assignment question — with a factual description of whatVecNormalizedoes.
- Replaced
-
load_sb3.py- Replaced a hardcoded training-run timestamp (
'010423183034') with an empty string placeholder and a clarifying comment. - Removed two internal bookkeeping comments naming specific checkpoint directories by timestamp.
- Removed
# [TODO]scaffold markers from the evaluation loop (array initialisation, per-step data saving, plot section header). - Removed a stale inline comment on
model.predict()(# sample at test time? ([TODO]: test)). - The data-collection arrays and plotting blocks (lines ~99–240) are preserved as commented-out code. These were the scripts used to produce the base-position, leg-contact, CPG amplitude/phase, base-velocity, duty-cycle, and Cost of Transport figures in the paper. They are kept as reference; they can be uncommented to reproduce those plots.
- Replaced a hardcoded training-run timestamp (
No algorithmic logic was altered.
CPG architecture based on:
Bellegarda, G. & Ijspeert, A. (2022). CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion. IEEE Robotics and Automation Letters. https://ieeexplore.ieee.org/abstract/document/9932888
Course skeleton and robot environment by Guillaume Bellegarda, EPFL BioRob (BSD-3-Clause).