Supplementary code for:
Neural reservoir control of a bio-hybrid soft arm
Naughton N, Tekinapl A, Shivam K, Kim SH, Kindratenko K, and Gazzola M, Proceedings of the National Academy of Sciences, 2026.
DOI: 10.1073/pnas.2522094123]
This repository contains the simulation and reinforcement learning code used to train and evaluate two reservoir computing neural network architectures for 3D soft-arm target tracking tasks.
3D_tracking/
├── core/ # Shared simulation framework
│ ├── elastica/ # PyElastica soft-body physics package
│ ├── ArmMuscleModel/ # Soft arm model with pressure-driven muscles
│ ├── MuscleTorquesWithBspline/ # Muscle torque model
│ ├── core_functions.py # Shared RL environment logic
│ └── set_environment_muscles.py # Elastica environment setup
│
├── rc_ann/ # Reservoir Computing (ANN readout) policy
│ ├── policy_train.py
│ ├── policy_evaluate.py
│ ├── input_parameters.json
│ ├── weights.zip # Reservoir weight matrices (W_reservoir.npy, W_in.npy)
│ └── POLICY.zip # Pre-trained policy weights
│
├── rc_spiking/ # Reservoir Computing (Spiking NN readout) policy
│ ├── policy_train.py
│ ├── policy_evaluate.py
│ ├── run_eval.py
│ ├── core_functions.py # Extends shared core with Nengo/spiking logic
│ ├── input_parameters.json
│ ├── nengo_loihi/ # Nengo-loihi library
│ ├── weights.zip # Spiking reservoir weight matrices (pkl files)
│ └── POLICY.zip
│
├── requirements.txt
├── LICENSE
└── README.md
Both versions use the same arm stiffness (Young's modulus scaling E = 0.125) and differ only in the neural network architecture used for the RL policy readout.
- Python 3.8 (other versions not tested)
-
Create a Python 3.8 environment (conda recommended):
conda create -n 3d_tracking python=3.8 conda activate 3d_tracking
-
Install dependencies:
pip install -r requirements.txt
Weight files: Each directory contains a
weights.zipcontaining the reservoir weights (which are pre-generated and not trained). They are automatically extracted the first time you runpolicy_evaluate.pyorpolicy_train.py— no manual step needed.
All scripts must be run from inside the respective directory so relative paths (to input_parameters.json, weight files, etc.) resolve correctly.
Each directory contains a POLICY.zip file with pre-trained PPO weights (as reported in the paper). These are loaded automatically by policy_evaluate.py. To evaluate these pre-trained policies, run:
cd rc_ann # or rc_spiking
python policy_evaluate.pyToo train your own policy from scratch, run:
cd rc_ann # or rc_spiking
python policy_train.pyNote that the current training setup has "num_procs": 120 in input_parameters.json meaning 120 elastica environments will launch during training. This will swamp most desktop CPUs and it is suggested you only run this many when on a server node with sufficient CPU cores. As a rule of thumb, set num_procs to the number of CPU cores on your machine.
Each directory contains an input_parameters.json file with hyperparameters:
| Key | Description |
|---|---|
network_arch |
Architecture type ("RC") |
seed |
Random seed |
RC_network_dim |
Number of reservoir neurons |
arm_dim |
Spatial dimension of the arm (3 for 3D) |
num_ctrl_pts |
Number of muscle control points |
target_traj_type |
Target trajectory type ("non_random") |
speed_coefficient |
Target movement speed coefficient |
E |
Young's modulus scaling (arm stiffness) |
num_procs |
Number of parallel simulation instances |
episode_length |
Length of each training episode (seconds) |
policy_updates |
Total simulated training time (seconds) |
RL_update_time |
RL policy update interval (seconds) |
reservoir_density |
Reservoir connectivity density |
reservoir_spectral_radius |
Reservoir spectral radius |
input_density |
Input weight matrix density |
n_epochs |
PPO training epochs per update |
learning_rate |
PPO learning rate |
minibatch_size |
PPO minibatch size |
gae_lambda |
GAE lambda for advantage estimation |
gamma |
Discount factor |
clip_range |
PPO clip range |
ent_coef |
Entropy coefficient |
An older versions of stable-baselines3 was used during development. Key package versions to use this code are (see requirements.txt for full pinned versions):
| Package | Version | Purpose |
|---|---|---|
stable-baselines3 |
1.5.0 | PPO reinforcement learning |
torch |
2.1.2 | Deep learning backend |
nengo |
3.1.0 | Spiking neural network framework |
snntorch |
0.9.1 | Spiking neuron models |
gym |
0.21.0 | RL environment interface |
numpy |
1.22.3 | Numerical computing |
numba |
0.55.2 | JIT compilation for elastica |
Local versions of pyelastica and nengo-loihi are provided in this repo as minor custom changes were made to each to ensure compatability.
If you use this code, please cite:
@article{Naughton.2026,
author = {Noel Naughton and Arman Tekinalp and Keshav Shivam and Seung Hyun Kim and Apoorva Khairnar and Volodymyr Kindratenko and Mattia Gazzola },
title = {Neural reservoir control of a bio-hybrid soft arm},
journal = {Proceedings of the National Academy of Sciences},
volume = {123},
number = {17},
pages = {e2522094123},
year = {2026},
doi = {10.1073/pnas.2522094123},
URL = {https://www.pnas.org/doi/abs/10.1073/pnas.2522094123},
eprint = {https://www.pnas.org/doi/pdf/10.1073/pnas.2522094123},
}MIT License — see LICENSE for details.
Code refactored for publication using the GitHub CoPilot CLI tool.