Slipstream: RL for Optimal Trade Execution

A minimal implementation exploring reinforcement learning for algorithmic execution.

The Problem

When executing a large order, you face a tradeoff:

Execute fast → High market impact, you move the price against yourself
Execute slow → Price risk, the market may move against you while waiting

This is the classic optimal execution problem from market microstructure. This project frames it as an RL problem and trains agents to learn execution strategies.

MDP Formulation

Component	Definition
State	`(remaining_qty, time_left, last_return, realised_vol, impact_coeff)`
Action	Fraction of remaining order to execute now ∈ [0, 1]
Reward	`-execution_cost - λ·risk_penalty`
Transition	Stochastic price (random walk) + Almgren-Chriss impact model
Horizon	Fixed T steps, or early termination when fully executed

Market Impact Model

Uses the Almgren-Chriss framework:

Temporary impact: η · (participation_rate)^β — price bounce from execution
Permanent impact: γ · quantity — lasting price shift from information leakage

Quick Start

# Clone and setup (using uv for speed)
git clone https://github.com/joshfinney/slipstream.git
cd slipstream
uv sync

# Run tests
uv run pytest

# Train an agent (~5 min on laptop)
uv run python experiments/train.py --timesteps 50000

# Evaluate against baselines
uv run python experiments/eval.py --model models/ppo_execution_final.zip

# View training curves
tensorboard --logdir logs/

Project Structure

├── env/
│   └── execution_env.py    # Custom Gymnasium environment
├── agents/
│   └── baselines.py        # TWAP, Random, Panic policies
├── experiments/
│   ├── train.py            # PPO training script
│   └── eval.py             # Evaluation harness
├── tests/
│   └── test_env.py         # Environment invariants
└── reports/
    └── results.md          # Generated evaluation report

Baselines

Policy	Strategy
TWAP	Execute uniformly over time (industry standard)
Random	Random execution rate (lower bound)
Panic	Hold back, then rush at deadline (common mistake)

Evaluation

The eval harness tests across regimes:

Normal: σ=2%, η=0.1
High Volatility: σ=5%, η=0.1
High Impact: σ=2%, η=0.3

Metrics: Mean cost, Std, Median, VaR 95%

Stack

Gymnasium — Environment API standard
Stable-Baselines3 — PPO implementation
uv — Fast Python package manager
Ruff — Linting & formatting

References

Almgren, R., & Chriss, N. (2001). Optimal execution of portfolio transactions. The Journal of Risk, 3(2), 5–39. DOI: 10.21314/JOR.2001.041. (Risk.net)
Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press. ISBN: 9781107091146. (Cambridge Assets)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
env		env
experiments		experiments
models		models
reports		reports
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slipstream: RL for Optimal Trade Execution

The Problem

MDP Formulation

Market Impact Model

Quick Start

Project Structure

Baselines

Evaluation

Stack

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Slipstream: RL for Optimal Trade Execution

The Problem

MDP Formulation

Market Impact Model

Quick Start

Project Structure

Baselines

Evaluation

Stack

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages