Skip to content

AlbianSalihu/LeggedRobotics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quadruped Locomotion — CPG & Deep Reinforcement Learning

EPFL Legged Robots course project (Nov 2022 – Jan 2023), completed as part of a two-person team in the BioRob lab (Prof. Auke Ijspeert). The project implements and compares two fundamentally different control strategies for a simulated 12.45 kg quadruped (Unitree A1, PyBullet):

  1. Central Pattern Generator (CPG) — a bio-inspired oscillator network with analytical PD control
  2. Deep Reinforcement Learning (DRL) — a CPG-augmented policy trained end-to-end with PPO and SAC

Results: the CPG approach achieves interpretable, gait-specific locomotion (best CoT = 0.459 with Pace); the DRL policy surpasses it (CoT = 0.300) after ~400 k training steps, at the cost of a black-box policy and long training time.


Repository contents

File Description
hopf_network.py CPG oscillator network — Hopf polar equations, phase coupling, gait matrices, RL modulation interface
run_cpg.py Runs the CPG controller open-loop in PyBullet; records CPG states, foot trajectories, and base velocity
quadruped_gym_env.py Custom OpenAI Gym environment wrapping the PyBullet simulation; defines observation/action spaces and reward functions
run_sb3.py Trains a PPO or SAC policy with Stable-Baselines3
load_sb3.py Loads a trained policy and runs evaluation rollouts

Simulation environment not included. This code depends on course-provided infrastructure: the quadruped module, configs_a1, robot URDF files, and utility libraries (utils/). These are not redistributable. The files here are the components authored during the project.


Part 1 — Central Pattern Generator

Hopf oscillator network

The CPG is modelled as four coupled non-linear oscillators (one per leg) operating in polar coordinates. Each oscillator i has amplitude rᵢ and phase θᵢ, governed by:

Amplitude (converges to √μ via a limit cycle):

ṙᵢ = α(μ − rᵢ²)rᵢ

Phase (with inter-leg coupling):

θ̇ᵢ = ωᵢ + Σⱼ rⱼ w sin(θⱼ − θᵢ − φᵢⱼ)

Parameters:

  • α = 50 — amplitude convergence rate
  • μ = 1 (non-RL) — intrinsic amplitude setpoint; oscillator converges to r = √μ = 1
  • w — coupling strength (default 1)
  • φᵢⱼ — desired inter-leg phase offsets (gait-specific, see below)

The natural frequency ωᵢ switches between two regimes based on the current phase:

  • Swing (0 ≤ θ ≤ π): foot lifting and advancing — ω_swing = 12π rad/s (default in run_cpg.py)
  • Stance (π < θ ≤ 2π): foot grounded, pushing — ω_stance = 4π rad/s

Integration uses first-order Euler in the open-loop CPG and trapezoidal (second-order) in the RL variant.

CPG → foot position mapping

CPG states are projected to Cartesian foot targets in the leg xz-plane:

x_foot = −d_step · r · cos(θ)
z_foot = −h + g_c · sin(θ)   if sin(θ) > 0   (swing: foot rises)
z_foot = −h + g_p · sin(θ)   otherwise         (stance: foot presses)
Parameter Value Meaning
d_step 0.05 m Desired step length
h 0.30 m Nominal robot height
g_c 0.07 m Ground clearance during swing
g_p 0.01 m Ground penetration depth during stance

Gait coupling matrices

Four gaits are implemented by choosing the inter-leg phase offset matrix φ (leg order: FR, FL, RR, RL):

Gait Coordination v_avg (fast) CoT (fast)
Trot Diagonal pairs in phase (φ = 0.5·2π) 0.612 m/s 0.543
Lateral walk Sequential lateral (φ offsets: 0, 0.5, 0.25, 0.75) 1.434 m/s 0.756
Bound Front/rear pairs in phase 0.626 m/s 0.842
Pace Ipsilateral pairs in phase 1.376 m/s 0.459

Pace achieves the lowest Cost of Transport (0.459). Bound is the least efficient (0.842).

Combined PD control law

Foot position targets from the CPG are tracked by a combined Joint + Cartesian PD controller:

τ_total = τ_joint + τ_cartesian

τ_joint     = Kp_j · (q_des − q) + Kd_j · (0 − q̇)
τ_cartesian = Jᵀ · [Kp_c · (p_des − p) − Kd_c · v_foot]

where J is the leg Jacobian, p is the current foot position, and v_foot = J q̇.

Key finding: Joint PD alone and Cartesian PD alone both produce unstable locomotion. Only the combined controller with tuned gains produces stable gaits.

Tuned gains (trot):

Mode Kp_joint Kd_joint Kp_cartesian Kd_cartesian
Joint + Cartesian PD 100 2 500 20
Joint PD only 200 2
Cartesian PD only 5000 80

Part 2 — Deep Reinforcement Learning

MDP formulation

Observation space (42-dimensional, LR_COURSE_OBS + CPG mode):

Component Dim Source
Motor angles 12 Joint encoders
Motor velocities 12 Joint encoders
Base orientation (quaternion) 4 IMU
Base linear velocity 3 IMU
Base angular velocity 3 IMU
CPG amplitudes r 4 CPG state
CPG phases θ 4 CPG state

Action space (CPG mode, 8-dimensional): the policy outputs modulation signals for CPG amplitude setpoints (μ) and oscillator frequencies (ω) for each of the 4 legs, scaled from [−1, 1]:

  • ω ∈ [5, 4.5·2π] rad/s
  • μ ∈ [MU_LOW², MU_UPP²] = [1, 4]

These drive the CPG, which maps to foot positions via IK, which are tracked by Joint PD — inheriting the structured motion priors of the oscillator network.

Episode length: 10 s (10 000 simulation steps at dt = 0.001 s, action_repeat = 10).

Reward function

The reward combines forward locomotion tracking with additional shaping terms:

Term Weight Purpose
vel_tracking +0.05 Gaussian reward for matching 1 m/s forward velocity
yaw penalty −0.2 Penalise heading drift
drift penalty −0.01 Penalise lateral position drift
energy penalty −0.01 Minimise motor power (τ · q̇)
orientation penalty −0.1 Keep base level (quaternion deviation)
vertical velocity penalty −0.1/5 Penalise vertical oscillation
yaw rate penalty −0.05/100 Penalise rearing/rolling
joint velocity penalty −0.001/150 Smooth limb movement
roll penalty −0.2/10 Limit roll oscillations
torque penalty −0.00002/20 Prevent unrealistic torques

All rewards are clipped to ≥ 0.

Training setup

# run_sb3.py key hyperparameters (PPO)
policy_kwargs = dict(net_arch=[256, 256])   # two-layer MLP
n_steps       = 4096
learning_rate = 1e-4
gamma         = 0.99
gae_lambda    = 0.95
batch_size    = 128
n_epochs      = 10
clip_range    = 0.2
total_timesteps = 1_000_000

Observations are normalised online via VecNormalize. Checkpoints are saved every 30 000 steps.

Results

Policy v_avg CoT
CPG-RL PPO (fast) 0.944 m/s 0.300
CPG-RL PPO (slow) 0.491 m/s 0.447

Action space ranking: CPG-RL >> Cartesian PD >> Joint PD. Pure joint or Cartesian PD action spaces fail to converge to stable locomotion. PPO converges in ~200 k steps (episode length) and ~400 k steps (reward). SAC was slower to converge and less stable.

The fast PPO policy (CoT = 0.300) beats every handcrafted CPG gait including Pace (CoT = 0.459).


Dependencies

python >= 3.8
numpy
pybullet
gym
stable-baselines3
matplotlib

The course infrastructure (quadruped, configs_a1, URDF assets, utils/) must be available on the Python path. These are not included in this repository.


Running

CPG controller:

# Edit gait, omega_swing, omega_stance in run_cpg.py, then:
python run_cpg.py

Train RL policy:

# Edit LEARNING_ALG, env_configs in run_sb3.py, then:
python run_sb3.py
# Checkpoints saved to ./logs/intermediate_models/<timestamp>/

Evaluate trained policy:

# Set log_dir in load_sb3.py to the checkpoint directory, then:
python load_sb3.py

Portfolio note — changes from original submission

The five Python files are functionally identical to the course submission. The following cosmetic/presentability changes were made for this portfolio repo:

  1. hopf_network.py

    • Removed an orphaned dead-code expression at the end of _integrate_hopf_equations_rl — a bare conditional whose result was silently discarded (copy-paste artifact from the non-RL method).
    • Removed the unused X_dot_prev variable from _integrate_hopf_equations — the Euler method does not use it; only the trapezoidal RL variant correctly uses it.
    • Replaced the _set_gait docstring [TODO] update all coupling matrices with an accurate description (all four gait matrices were already implemented).
    • Removed inline # [TODO] course-scaffold markers on completed code lines throughout (update(), _integrate_hopf_equations, _integrate_hopf_equations_rl), replacing them with descriptive comments.
  2. run_cpg.py

    • Fixed a wrong inline comment: TEST_STEPS = int(5 / TIME_STEP) #10 seconds — 5/0.001 = 5000 steps = 5 seconds. Corrected to # 5 seconds.
    • Removed # [TODO] markers on all completed code lines in the simulation loop, replacing them with descriptive comments.
    • The plotting section (lines ~165–260) is preserved as commented-out code. These blocks were the analysis scripts used to generate the CPG state plots, foot-position tracking, joint angle tracking, base velocity, duty cycle, and Cost of Transport figures reported in the paper. They are kept as reference; they can be uncommented to reproduce those figures.
  3. quadruped_gym_env.py

    • Removed #SEE LECTURE 7 SLIDE 95 — an internal course reference.
    • Rewrote the observation-space mode comment block from assignment-prompt style ([TODO: what should you include?]) to a factual description of what was implemented.
    • Replaced the _reward_lr_course docstring (original assignment prompt) with a description of what the function actually does.
    • Replaced the ScaleActionToCartesianPos docstring (written as an editing instruction) with a description of the method's behaviour.
    • Removed inline # [TODO] markers from ScaleActionToCartesianPos and ScaleActionToCPGStateModulations where code was already in place.
    • Removed a stray #Anything else to add ? scaffold comment from the non-CPG observation space upper bound.
  4. run_sb3.py

    • Replaced # normalize observations to stabilize learning (why?) — a rhetorical course-assignment question — with a factual description of what VecNormalize does.
  5. load_sb3.py

    • Replaced a hardcoded training-run timestamp ('010423183034') with an empty string placeholder and a clarifying comment.
    • Removed two internal bookkeeping comments naming specific checkpoint directories by timestamp.
    • Removed # [TODO] scaffold markers from the evaluation loop (array initialisation, per-step data saving, plot section header).
    • Removed a stale inline comment on model.predict() (# sample at test time? ([TODO]: test)).
    • The data-collection arrays and plotting blocks (lines ~99–240) are preserved as commented-out code. These were the scripts used to produce the base-position, leg-contact, CPG amplitude/phase, base-velocity, duty-cycle, and Cost of Transport figures in the paper. They are kept as reference; they can be uncommented to reproduce those plots.

No algorithmic logic was altered.


Reference

CPG architecture based on:

Bellegarda, G. & Ijspeert, A. (2022). CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion. IEEE Robotics and Automation Letters. https://ieeexplore.ieee.org/abstract/document/9932888

Course skeleton and robot environment by Guillaume Bellegarda, EPFL BioRob (BSD-3-Clause).

About

Bio-inspired CPG controller and CPG-augmented PPO/SAC deep RL for quadruped locomotion on a simulated Unitree A1 (PyBullet). EPFL Legged Robots, BioRob lab.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages