AI - TRAINER - AI Trading System

A comprehensive reinforcement learning trading system with an interactive command-line interface for managing the complete ML pipeline from data processing to model evaluation.

Overview

This project implements a three-phase curriculum learning approach for training trading agents that comply with Apex Trader Funding rules. The system uses PPO (Proximal Policy Optimization) algorithm with transfer learning and optional LLM integration to develop robust trading strategies.

Features

Interactive CLI Menu System - Easy-to-use interface for all operations
Three-Phase Training Pipeline - Progressive curriculum learning
- Phase 1: Entry signal quality learning (fixed SL/TP)
- Phase 2: Position management with dynamic SL/TP and action masking
- Phase 3: Hybrid RL + LLM agent with context-aware reasoning
Continue Training from Checkpoint - Resume training from any saved model
Smart Model Detection - Automatically finds and loads the newest models
Automatic Data Corruption Detection - Detects and fixes divide-by-100 errors
Multi-Market Support - 8 major futures instruments with automatic contract specs
Action Masking - Prevents invalid actions in Phase 2 & 3
Comprehensive Testing Framework - Hardware-maximized and pipeline testing modes
Thread Pool Management - Stable training on constrained systems
Model Evaluation - Comprehensive performance metrics
Compliance Enforcement - 100% Apex Trader Funding rules compliance
Progress Tracking - Colored output and detailed logging

Supported Instruments

NQ - Nasdaq 100 E-mini
ES - S&P 500 E-mini
YM - Dow Jones E-mini
RTY - Russell 2000 E-mini
MNQ - Micro Nasdaq 100
MES - Micro S&P 500
M2K - Micro Russell 2000
MYM - Micro Dow Jones

Project Structure

AI Trainer/
├── AGENTS.md                  # Repo guardrails & coding standards
├── README.md                  # Usage guide (this file)
├── changelog.md               # High-level change log
├── requirements.txt           # Python dependencies
├── pyrightconfig.json         # Type-checker settings
├── main.py                    # Interactive CLI entry point
│
├── config/                    # Runtime configuration
│   ├── llm_config.yaml              # Phase 3 LLM configuration
│   ├── test_hardware_maximized.yaml # Testing framework preset
│   └── test_pipeline.yaml           # Pipeline smoke-test preset
│
├── src/                       # Core implementation
│   ├── Data processing: update_training_data.py, process_new_data.py, reprocess_from_source.py, clean_second_data.py, process_second_data.py, data_validator.py, validate_data_quality.py
│   ├── Training: train_phase1.py, train_phase2.py, train_phase3_llm.py
│   ├── Environments: environment_phase1.py, environment_phase1_simplified.py, environment_phase2.py, environment_phase3_llm.py
│   ├── Evaluation: evaluate_phase2.py, evaluate_phase3_llm.py
│   ├── Hybrid/LLM stack: hybrid_agent.py, hybrid_policy*.py, fusion_network.py, async_llm.py, llm_reasoning.py, llm_features.py, llm_callback.py, chain_of_thought.py
│   ├── Utilities: market_specs.py, metadata_utils.py, apex_compliance_checker.py, model_utils.py, feature_engineering.py, technical_indicators.py, testing_framework.py, diagnose_environment.py, kl_callback.py
│
├── data/                      # Market CSVs required for training (example NQ files kept)
├── models/                    # Checkpoints (empty placeholders for upload)
├── logs/                      # Runtime logs (emptied before packaging)
├── results/                   # Evaluation artifacts (emptied)
└── tensorboard_logs/          # TensorBoard runs (emptied)

Note: Documentation, screenshots, examples, scripted pipelines, and the legacy pytest suite were intentionally removed to keep the deployment bundle lightweight for pod uploads. The CLI, source tree, configs, and data hooks cover every required training/evaluation workflow.

Installation

Prerequisites

Python 3.11.9 or higher (Python 3.13 supported)
pip (Python package manager)
(Optional) CUDA-compatible GPU for faster training

Step 1: Install Dependencies

Run the main menu and select option 1, or install manually:

# Using main.py menu (recommended)
python main.py
# Select option 1: Requirements Installation

# Or install manually
pip install -r requirements.txt

Step 2: Verify Installation

# Test imports
python -c "import gymnasium, stable_baselines3, pandas, numpy; print('✓ All core dependencies installed')"

# Or run the pipeline smoke-test
python src/testing_framework.py --mode pipeline --market NQ

Quick Start

Using the Interactive Menu (Recommended)

python main.py

The interactive menu provides 5 main options:

Requirements Installation - Install all dependencies
Data Processing - Process market data for training
Training Model - Train Phase 1, 2, and 3 models
- Training Test (Local Testing)
- Training Pod (Production)
- Continue from Existing Model ⭐ NEW
Evaluator - Evaluate trained models
Exit - Close the program

Continue Training Feature ⭐ NEW

Resume training from any previously saved model:

python main.py
# Select 3: Training Model
# Select 3: Continue from Existing Model
# Choose from list of available models
# Select test or production mode
# (project structure block to be updated later)

**Benefits:**
- Resume interrupted training sessions
- Extend training for models that haven't converged
- Continue training with different hyperparameters
- No need to remember model names - automatic detection
- Preserves timestep count and training progress

**Command-line usage:**
```bash
# Continue training from a specific model
python src/train_phase1.py --continue --model-path models/phase1_foundational_final.zip

# Continue in test mode
python src/train_phase1.py --continue --model-path models/phase1_foundational_final.zip --test

Manual Workflow

1. Process Data

NEW: Automatic corruption detection and fixing! 🎉

When you download new data from Databento, just run:

# Process new data with automatic corruption detection & fixing
python src/process_new_data.py --market NQ

The script automatically:

✅ Detects corrupted data (divide-by-100 errors)
✅ Fixes corruption automatically
✅ Validates data quality
✅ Generates clean training-ready data

Alternative (direct processing):

# Also has automatic corruption detection built-in
python src/update_training_data.py --market ES

Reprocess old corrupted data:

# Deletes old files and reprocesses from source
python src/reprocess_from_source.py --market NQ

Available markets: NQ, ES, YM, RTY, MNQ, MES, M2K, MYM
Tip: add --dry-run to preview every step without touching files.

2. Train Models

# Phase 1: Entry Signal Learning (2M timesteps)
python src/train_phase1.py

# Phase 2: Position Management (5M timesteps with transfer learning)
python src/train_phase2.py

# Phase 3: Hybrid RL + LLM Agent (5M timesteps)
python src/train_phase3_llm.py

For quick testing (reduced dataset):

python src/train_phase1.py --test
python src/train_phase2.py --test
python src/train_phase3_llm.py --test

3. Evaluate Model

python main.py
# Select 4: Evaluator  (auto-loads latest Phase 3 model and tests on holdout data)

Manual invocation (if you need custom settings):

python src/evaluate_phase3_llm.py \
  --model models/phase3_hybrid/phase3_hybrid_final.zip \
  --market NQ \
  --episodes 20 \
  --holdout-fraction 0.2

The evaluator automatically carves out the most recent 20 % of minute data as a holdout set, so performance metrics reflect unseen market conditions. Both the human-readable report and a JSON summary are written to results/phase3_evaluation_<timestamp>.txt/.json for easy archiving.

LLM Asset Preparation (Phase 3)

Phase 3 uses the Phi-3-mini LLM. To keep the repo lean, the weights are downloaded on demand. You can manage them via the built-in helper:

# Download and cache the configured LLM locally
python src/llm_asset_manager.py --download

If the cache is missing, the CLI will prompt you to either download (~4 GB) or run in mock mode (RL-only fallback). The download path is stored in config/llm_config.yaml under llm_model.local_path, so pods with fast internet can fetch once and reuse the weights offline.

Requires: git + git lfs (preferred). If git isn’t available the tool falls back to huggingface_hub, so pip install huggingface_hub is recommended as well.

4. View Training Progress

tensorboard --logdir tensorboard_logs/

🌍 Multi-Market Support

The RL Trainer supports 8 different futures markets with automatic contract specification handling:

Supported Markets

Symbol	Name	Contract Size	Tick Size	Tick Value	Default Commission	Type
ES	E-mini S&P 500	$50	0.25	$12.50	$2.50/side	E-mini
NQ	E-mini Nasdaq-100	$20	0.25	$5.00	$2.50/side	E-mini
YM	E-mini Dow Jones	$5	1.0	$5.00	$2.50/side	E-mini
RTY	E-mini Russell 2000	$50	0.10	$5.00	$2.50/side	E-mini
MNQ	Micro E-mini Nasdaq	$2	0.25	$0.50	$0.60/side	Micro
MES	Micro E-mini S&P 500	$5	0.25	$1.25	$0.60/side	Micro
M2K	Micro E-mini Russell	$5	0.10	$0.50	$0.60/side	Micro
MYM	Micro E-mini Dow	$0.50	1.0	$0.50	$0.60/side	Micro

How It Works

Process data for your desired market (e.g., NQ)
Start training - the system automatically:
- Detects available market data files
- Prompts you to select a market (if multiple exist)
- Loads correct contract specifications
- Applies market-specific P&L calculations, commissions, and slippage

Example: Training on Multiple Markets

# Process NQ data
python main.py
# Select: Data Processing → NQ

# Train on NQ
python main.py
# Select: Training Model → Training Pod
# System shows: "Detected 2 markets: ES, NQ"
# Select: NQ
# Training uses NQ specs automatically!

Custom Commission Override

You can override default commissions in training configs:

# train_phase1.py or train_phase2.py
PHASE1_CONFIG = {
    # ... other config ...
    'commission_override': 1.50  # Use custom commission instead of market default
}

Market-Dependent Features

P&L Calculations: Automatically use correct contract multiplier
Slippage Modeling:
- Liquid markets (ES, NQ, MNQ, MES) = 1 tick slippage
- Less liquid (YM, RTY, M2K, MYM) = 2 ticks slippage
Commissions:
- E-mini default = $2.50/side
- Micro default = $0.60/side
- Fully configurable per training run

System Architecture

Three-Phase Curriculum Learning

Phase 1: Entry Signal Learning
  ├─ Fixed SL/TP (1.5x ATR SL, 3:1 ratio)
  ├─ 3 actions: Hold, Buy, Sell
  ├─ Focus: Entry signal quality
  ├─ Constraints: Relaxed for learning ($15K trailing DD)
  └─ Duration: 2M timesteps (~6-8 hours on RTX 4000)

Phase 2: Position Management (Transfer Learning)
  ├─ Auto-loads newest Phase 1 model
  ├─ Inherit Phase 1 weights
  ├─ Dynamic SL/TP adjustment
  ├─ 6 actions: Streamlined position management
  ├─ Focus: Risk management
  ├─ Constraints: Strict Apex compliance ($2.5K trailing DD)
  └─ Duration: 5M timesteps (~8-10 hours)

Phase 3: Hybrid RL + LLM Agent
  ├─ Action Space: 6 actions (same as Phase 2)
  ├─ LLM: Phi-3-mini (3.8B params, INT8 quantized)
  ├─ Decision Fusion: Confidence-weighted voting
  ├─ Observation Space: 261D (extended context)
  ├─ Hardware: GPU with 8GB+ VRAM
  └─ Duration: 5M timesteps (~12-16 hours)

Multi-Layer Safety System

Layer 1: Environment Level (Primary)
  └─ Apex rules enforced in reward + done signal

Layer 2: Wrapper Level (Secondary)
  └─ Safety validation before action execution

Layer 3: Validation Level (Verification)
  └─ Post-training compliance checks

Thread Pool Management

The system includes automatic thread pool management to prevent OpenBLAS pthread creation failures:

# Automatic BLAS thread limiting
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

# Automatic environment count alignment with CPU cores
# Override with: TRAINER_NUM_ENVS=40

Benefits:

Stable training on constrained systems
Prevents "pthread_create failed" errors
Automatic CPU core detection and optimization
Configurable overrides for advanced users

Apex Trader Funding Compliance

The system enforces 100% compliance with Apex rules:

4:59 PM ET mandatory position close
$2,500 max trailing drawdown limit
No overnight positions
Position size constraints
Trade frequency limits

These checks mirror the Apex Trader Funding guardrails enforced in every environment.

Configuration

Phase 1 Configuration (src/train_phase1.py)

PHASE1_CONFIG = {
    # Training
    'total_timesteps': 2_000_000,
    'num_envs': 80,  # Auto-adjusted based on CPU cores
    
    # Network
    'policy_layers': [512, 256, 128],
    
    # PPO parameters
    'learning_rate': 3e-4,
    'n_steps': 2048,
    'batch_size': 512,
    'n_epochs': 10,
    'gamma': 0.99,
    'gae_lambda': 0.95,
    'clip_range': 0.2,
    'ent_coef': 0.01,
    'vf_coef': 0.5,
    
    # Device
    'device': 'cuda'  # or 'cpu'
}

Phase 2 Configuration (src/train_phase2.py)

PHASE2_CONFIG = {
    # Training
    'total_timesteps': 5_000_000,
    'num_envs': 80,  # Auto-adjusted based on CPU cores
    
    # Network (reduced capacity to prevent overfitting)
    'policy_layers': [512, 256, 128],
    
    # PPO parameters
    'learning_rate': 3e-4,
    'n_steps': 2048,
    'batch_size': 512,
    'n_epochs': 10,
    'gamma': 0.99,
    'gae_lambda': 0.95,
    'clip_range': 0.2,
    'ent_coef': 0.01,
    'vf_coef': 0.5,
    'target_kl': 0.01,
    
    # Device
    'device': 'cuda'  # or 'cpu'
}

Phase 2 Actions (6 total):

Actions 0-2: Entry/Exit (Hold, Buy, Sell)
Action 3: Move SL to Break-Even
Action 4: Enable Trailing Stop
Action 5: Disable Trailing Stop

Note: Simplified from original 9 actions for improved sample efficiency and reduced overfitting

Phase 3: Hybrid RL + LLM Agent (Advanced)

NEW - Combines reinforcement learning with LLM reasoning for intelligent trading decisions.

# Phase 3: Hybrid LLM Agent (requires GPU with 8GB+ VRAM)
python main.py
# Select: Training → Phase 3 Hybrid (Test Mode)  # ~30 min test run
# Select: Training → Phase 3 Hybrid (Production)  # ~12-16 hours

Phase 3 Features:

Context Awareness: 261D observations (vs 228D in Phase 2)
- Extended market context (ADX slope, VWAP distance, volatility regime)
- Multi-timeframe indicators (SMA-50/200 slopes, multi-timeframe RSI)
- Pattern recognition (higher/lower highs/lows, support/resistance)
- Risk context (unrealized P&L, drawdown tracking, consecutive losses)
LLM Reasoning: Phi-3-mini provides explicit analysis before decisions
Decision Fusion: Intelligent combination of RL + LLM recommendations
Risk Management: Automatic veto of high-risk actions
Selective Querying: Queries LLM only when needed (reduces latency)

Hardware Requirements:

GPU: RTX 3060 or better (8GB+ VRAM recommended)
RAM: 16GB minimum, 32GB recommended
Storage: 10GB additional space

Key Files:

src/train_phase3_llm.py - Training pipeline
src/hybrid_agent.py - Decision fusion logic
src/llm_reasoning.py - LLM reasoning module
src/llm_features.py - Extended feature calculation
config/llm_config.yaml - LLM configuration

Quick Start:

# Install LLM dependencies
pip install transformers torch accelerate bitsandbytes sentencepiece

# Run test training
python main.py
# Select: Training → Phase 3 Hybrid (Test Mode)

Configuration: Edit config/llm_config.yaml to customize:

LLM model selection (Phi-3-mini, Phi-3-small, etc.)
Decision fusion parameters (LLM weight: 0.0-1.0, confidence thresholds)
Prompt templates for different market conditions
Risk veto thresholds
Performance optimization settings

Performance:

Test mode: ~30 minutes (50K timesteps)
Production: ~12-16 hours (5M timesteps)
LLM inference: ~15-20ms per query (RTX 3060)
VRAM usage: ~4GB with INT8 quantization

Evaluation:

python src/evaluate_phase3_llm.py \
    --model models/phase3_hybrid_final \
    --market NQ \
    --episodes 20

Testing Framework

The system includes an optimized testing framework with two modes:

1. Hardware-Maximized Validation Mode

Full GPU utilization with reduced timesteps
Cached LLM feature calculations
Vectorized decision fusion
Performance benchmarking

python src/testing_framework.py --mode hardware_maximized --market NQ

2. Automated Sequential Pipeline Mode

Continuous execution with checkpointing
Automated validation
Progress tracking

python src/testing_framework.py --mode pipeline --market NQ

Testing Framework Coverage:

Validates environment setup and observation flow
Exercises action masking correctness
Runs end-to-end RL + LLM integration loops
Checks market detection and data hooks
Confirms Apex compliance checkpoints

Data Format

Training data should be CSV files with the following columns:

Minute Data (D1M):

DateTime
Open, High, Low, Close
Volume

Second Data (D1S):

DateTime
Open, High, Low, Close
Volume

Data files should be placed in data/ directory with naming convention:

{INSTRUMENT}_D1M.csv (minute data)
{INSTRUMENT}_D1S.csv (second data)

Data Processing Pipeline

# Process new data (auto-corruption detection)
python src/process_new_data.py --market NQ

# Reprocess old corrupted data
python src/reprocess_from_source.py --market NQ

# Validate data quality
python src/validate_data_quality.py --market NQ

Corruption Detection: Automatically detects and fixes divide-by-100 errors using statistical analysis and percentile-based validation.

Troubleshooting

CUDA Out of Memory

Reduce batch size or number of parallel environments:

# In train_phase1.py or train_phase2.py
'batch_size': 256,  # Down from 512
'num_envs': 40,     # Down from 80

Or use CPU:

'device': 'cpu'

Training Divergence

Reduce learning rate: 3e-4 → 1e-4
Increase clip range: 0.2 → 0.3
Check reward scaling
Enable KL divergence monitoring

Import Errors

Reinstall the required dependencies:

python -m pip install -r requirements.txt

Model Not Found (Phase 2)

The system auto-detects the newest Phase 1 model. Ensure:

Phase 1 training completed
Model exists in models/ directory
Model filename contains "phase1"

Thread Pool Errors (OpenBLAS)

The system automatically limits thread pools. If you encounter issues:

# Explicitly set thread limits
set TRAINER_MAX_BLAS_THREADS=1
set TRAINER_NUM_ENVS=40

# Or use environment variables
export TRAINER_MAX_BLAS_THREADS=1
export TRAINER_NUM_ENVS=40

Data Corruption

Run reprocessing:

python src/reprocess_from_source.py --market NQ

Performance Metrics

Target metrics for successful training:

Metric	Target	Status
Apex Compliance	100%	Enforced
Sharpe Ratio	> 2.5	Phase 2 target
Win Rate	> 50%	Target
Max Drawdown	< 5%	Enforced
GPU Utilization	85%+	Optimized
LLM Query Latency	< 20ms	Phase 3

Logs and Outputs

Training logs: logs/
TensorBoard logs: tensorboard_logs/
Model checkpoints: models/
Evaluation results: results/
Testing logs: logs/testing/

Tips

All operations are logged for debugging
Press Ctrl+C to cancel any operation
Check logs directory for detailed output
Ensure sufficient disk space (>20GB for full training)
Use test mode (--test flag) for quick validation
Monitor GPU/CPU usage during training
Use "Continue Training" to resume interrupted sessions
Phase 2 automatically finds your newest Phase 1 model
Before long runs, exercise the pipeline test harness: python src/testing_framework.py --mode pipeline --market NQ

Common Workflows

1. First Time Setup

python main.py
# Select 1: Requirements Installation
# Select 2: Data Processing (choose instrument)
# Wait for completion

2. Complete Training Pipeline (Production)

python main.py
# Select 3: Training Model → 2: Training Pod (Production)
# Wait for completion (~26-34 hours total)
# Select 4: Evaluator

3. Quick Test Run

python main.py
# Select 3: Training Model → 1: Training Test (Local)
# Wait for completion (~30-45 minutes)
# Select 4: Evaluator

4. Resume Training from Checkpoint

python main.py
# Select 3: Training Model → 3: Continue from Existing Model
# Select model from list
# Choose test or production mode
# Training continues from checkpoint

5. Data Processing Workflow

# Download new data from Databento
# Process with automatic corruption detection
python src/process_new_data.py --market NQ

# Validate data quality
python src/validate_data_quality.py --market NQ

# Start training
python src/train_phase1.py --market NQ

6. Testing Workflow

# Run hardware-maximized test
python src/testing_framework.py --mode hardware_maximized --market NQ

# Run pipeline test
python src/testing_framework.py --mode pipeline --market NQ

Technology Stack

RL Framework: Stable Baselines3 (PPO algorithm)
Action Masking: sb3-contrib (MaskablePPO)
Environment: Gymnasium API
Deep Learning: PyTorch 2.0+, TensorFlow 2.15+
LLM Integration: Hugging Face Transformers (Phi-3-mini)
Quantization: bitsandbytes (INT8/INT4)
Data Processing: Pandas, NumPy, SciPy, Scikit-Learn
Visualization: Matplotlib, Plotly, TensorBoard
UI: Colorama, tqdm, rich
Testing: Custom testing_framework + pyright type checks

License

Apache License 2.0 (inherited from TensorTrade)

Contributing

Validate end-to-end flow: python src/testing_framework.py --mode pipeline --market NQ
Run static checks: pyright
Follow existing code style (PEP 8 + type hints)
Document user-facing changes in this README
Update changelog.md with a concise summary

Support & Contact

Author: Javier
X (Twitter): @javiertradess

For technical issues:

Check logs in logs/ directory
Validate data integrity: python src/validate_data_quality.py --market NQ
Run the pipeline test harness: python src/testing_framework.py --mode pipeline --market NQ
Review changelog.md for the latest fixes and behavioral notes

Version

RL TRAINER v1.1.0 - November 2025
Based on TensorTrade v1.0.4-dev1

Recent Updates:

✅ Three-phase curriculum learning (Phase 3 Hybrid LLM)
✅ Continue training from any checkpoint
✅ Action space optimization (9→6 actions in Phase 2)
✅ Automatic data corruption detection & fixing
✅ Thread pool management for stable training
✅ Comprehensive testing framework
✅ Smart model detection and auto-loading
✅ Multi-market support with auto contract specs

Getting Started: Run python main.py and follow the interactive menu!

Star this repo if you find it useful! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
Agent_temp		Agent_temp
config		config
dashboard		dashboard
docs		docs
img		img
logs		logs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
changelog.md		changelog.md
main.py		main.py
pyrightconfig.json		pyrightconfig.json
requirements-jax.txt		requirements-jax.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI - TRAINER - AI Trading System

Overview

Features

Supported Instruments

Project Structure

Installation

Prerequisites

Step 1: Install Dependencies

Step 2: Verify Installation

Quick Start

Using the Interactive Menu (Recommended)

Continue Training Feature ⭐ NEW

Manual Workflow

1. Process Data

2. Train Models

3. Evaluate Model

LLM Asset Preparation (Phase 3)

4. View Training Progress

🌍 Multi-Market Support

Supported Markets

How It Works

Example: Training on Multiple Markets

Custom Commission Override

Market-Dependent Features

System Architecture

Three-Phase Curriculum Learning

Multi-Layer Safety System

Thread Pool Management

Apex Trader Funding Compliance

Configuration

Phase 1 Configuration (src/train_phase1.py)

Phase 2 Configuration (src/train_phase2.py)

Phase 3: Hybrid RL + LLM Agent (Advanced)

Testing Framework

1. Hardware-Maximized Validation Mode

2. Automated Sequential Pipeline Mode

Data Format

Data Processing Pipeline

Troubleshooting

CUDA Out of Memory

Training Divergence

Import Errors

Model Not Found (Phase 2)

Thread Pool Errors (OpenBLAS)

Data Corruption

Performance Metrics

Logs and Outputs

Tips

Common Workflows

1. First Time Setup

2. Complete Training Pipeline (Production)

3. Quick Test Run

4. Resume Training from Checkpoint

5. Data Processing Workflow

6. Testing Workflow

Technology Stack

License

Contributing

Support & Contact

Version

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages