RLQuant: Reinforcement Learning for Options Hedging

A reinforcement learning framework that use actor-critic reinforcement learning to learns optimal hedging strategies and valuation model at the same time for European vanilla options. It extend and analogize the replication framework of Black-Scholes-Merton into a reinforcement learning framework.

Overview

RLQuant uses modified actor-critic reinforcement learning to solve the options hedging problem:

Actor: Learns the optimal delta (hedging position) as a function of market state
Critic: Learns the option value as a function of market state

The key innovation is using pretraining method with domain knowledge to speed up the convergence, both use simple supervised learning method:

Actor Pretraining: Since we don't have a good critic yet, we train without critic(valuation) model by the fact that the sum of delta hedge p/l should close to final payoff, if the initial valuation is close to 0.(By choosing a short term out of money option)
Critic Pretraining: After we have a decent actor(delta), we can have proxy of target by substracting delta p/l from next period valuation, starting from final payoff.

Project Structure

RLQuant/
├── Envs.py              # Trading environment (Black-Scholes process, VanillaEnv)
├── model.py             # Neural network architectures (actor, critic)
├── pretrain.py          # Pretraining pipeline using domain knowledge
├── main.py              # Main training loop with actor-critic learning
├── blackscholes.py      # Black-Scholes pricing and Greeks calculations
├── replay_buffer.py     # Experience replay buffer for training
├── without_pretrain.py  # Baseline: training without pretraining
└── demo.ipynb          # Demonstration notebook

Key Components

Environment (`Envs.py`)

BlackProcess: Generates stock price paths using geometric Brownian motion
- Parameters: initial price (S0), drift (r), volatility (sigma), tenor (days)
VanillaEnv: Options trading environment
- Observation: (moneyness, moneyness², time_to_maturity, time_to_maturity², moneyness × time_to_maturity)
- Action: Delta hedging position (-1 to 1)
- Reward: P/L from hedging

Models (`model.py`)

Actor: Dense(64) → Dense(64) → Dense(1, tanh)
- Input: Market observation (5D)
- Output: Delta position (-1 to 1)
Critic: Dense(64) → Dense(64) → Dense(1, sigmoid)
- Input: Market observation (5D)
- Output: Option value (0 to 1)

Pretraining (`pretrain.py`)

Actor Pretraining:

Pre-train network of Actor with finacl payoff of short term, out-off money option.

Critic Pretraining:

Uses pretrained actor to generate delta p/l
Use delta p/l to get proxy target of each periods valuation.
Loss: MSE between predicted and target values

Training (`main.py`)

Actor-Critic Algorithm:

Critic Loss: (V(S') × df - R + V(S) - δ × ΔS)²
Actor Loss: (δ × (ΔS - μ) + V(S') × df - V(S))²
Soft target update: target ← target × (1-τ) + critic × τ
Training: 50 episodes, batch size 32, replay buffer 1024

Installation

Requirements

Python 3.7+
TensorFlow 2.x
NumPy

Setup

# Install dependencies
pip install tensorflow numpy

# Run training
python main.py

Usage

Training the Model

python main.py

Output shows:

Pretraining progress (actor and critic)
Training episodes with hedge P/L and option payoff
Final comparison: RL model vs Black-Scholes

Example output:

pretrain actor
pretrain critic
train like actor-critic
episode 0
total hedge P/L: 0.0234, option payoff: 0.0500
...
episode 49
total hedge P/L: 0.0198, option payoff: 0.0450

by RL model:
 option value: 0.0234, delta: 0.4567
by black-scholes model:
 option value: 0.0231, delta: 0.4523

Pretraining Only

python pretrain.py

Evaluates pretraining performance and shows learned option values vs target values.

Demo

See demo.ipynb for interactive examples and visualizations.

How It Works

Problem Formulation

For a European vanilla option with strike K and expiry T:

At each time step t, we choose a hedging position δ(t)
Stock price moves by ΔS, generating P/L: δ × ΔS
Discount factor: df = e^(-r/365)
Goal: Learn δ(t) to minimize hedging error

Learning Process

Pretraining Phase:
- Initialize actor with Black-Scholes behavior
- Initialize critic with Monte Carlo values
- Provides warm start with financial domain knowledge
Fine-tuning Phase:
- Collect experiences in replay buffer
- Update critic: predict option value V(S)
- Update actor: maximize hedge effectiveness
- Soft update target network for stability

Convergence

The learned delta converges toward the theoretical Black-Scholes delta, demonstrating that RL can discover optimal hedging strategies from data.

Configuration

Main parameters in main.py:

S0, r, vol, days, strike = 1, 0.01, 0.3, 30, 1.1  # Market parameters
n_samples = 2**12                                  # Pretraining samples
n_hidden = [64, 64]                              # Hidden layer sizes
n_episodes = 50                                   # Training episodes
batch_size = 32                                   # Mini-batch size
tau = 0.1                                         # Soft update coefficient

Performance Metrics

Option Value: Learned value vs Black-Scholes price
Delta: Learned hedge position vs theoretical delta
Hedge P/L: Cumulative profit/loss from hedging

References

Black-Scholes formula for European options
Actor-Critic methods with target networks
Experience replay for stability
Geometric Brownian Motion for stock prices

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLQuant: Reinforcement Learning for Options Hedging

Overview

Project Structure

Key Components

Environment (`Envs.py`)

Models (`model.py`)

Pretraining (`pretrain.py`)

Training (`main.py`)

Installation

Requirements

Setup

Usage

Training the Model

Pretraining Only

Demo

How It Works

Problem Formulation

Learning Process

Convergence

Configuration

Performance Metrics

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Envs.py		Envs.py
README.md		README.md
blackscholes.py		blackscholes.py
demo.ipynb		demo.ipynb
main.py		main.py
model.py		model.py
pretrain.py		pretrain.py
replay_buffer.py		replay_buffer.py
without_pretrain.py		without_pretrain.py

Folders and files

Latest commit

History

Repository files navigation

RLQuant: Reinforcement Learning for Options Hedging

Overview

Project Structure

Key Components

Environment (Envs.py)

Models (model.py)

Pretraining (pretrain.py)

Training (main.py)

Installation

Requirements

Setup

Usage

Training the Model

Pretraining Only

Demo

How It Works

Problem Formulation

Learning Process

Convergence

Configuration

Performance Metrics

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment (`Envs.py`)

Models (`model.py`)

Pretraining (`pretrain.py`)

Training (`main.py`)

Packages