Autonomous Kaggle ML Agent

A fully autonomous, competition-agnostic ML agent that takes a Kaggle competition URL, sets itself up from scratch, and iteratively improves a machine learning model using a Gemini LLM in a continuous feedback loop.

How It Works

The system has two layers:

Layer 1 — Context Agent (`context_agent.py`)

Runs once per competition to bootstrap the working environment.

Parse slug — extracts the competition slug from a URL or accepts it directly
Create directory — competitions/<slug>/ with data/, submissions/, logs/
Download data — calls kaggle competitions download and unzips all files
Inspect dataset — reads all CSV/parquet files: shapes, dtypes, missing values, sample rows, description files
Fetch metadata — title, category, reward, deadline via kaggle competitions list
Detect target & ID columns — infers target (column in train but not test) and ID column automatically
Generate program.md — LLM writes a competition-specific system prompt (rules, metric, dataset notes, 8 improvement directions)
Generate experiment.py — LLM writes a complete, runnable starter ML pipeline suited to the task type
Launch agent.py — starts the iterative loop in the competition directory

Layer 2 — Iterative Agent (`agent.py`)

Runs continuously, improving the model every iteration.

┌─────────────────── MAIN LOOP ───────────────────┐
│                                                  │
│  1. Run experiment.py as subprocess              │
│     └─ writes submission.csv                     │
│     └─ prints cv_metric: X.XXXXX                 │
│                                                  │
│  2. Parse cv metric from output                  │
│                                                  │
│  3. If crash: ask LLM to fix → retry (×3)        │
│                                                  │
│  4. Submit to Kaggle CLI (if not --skip-kaggle)  │
│                                                  │
│  5. Keep if metric improved, else revert to best │
│                                                  │
│  6. Append result to results.tsv                 │
│                                                  │
│  7. Ask LLM for next improvement idea            │
│     └─ LLM rewrites experiment.py completely     │
│                                                  │
│  8. Repeat forever                               │
└─────────────────────────────────────────────────┘

Full Architecture

context_agent.py  ←── competition URL / slug
      │
      ├── kaggle CLI           (download dataset)
      ├── Gemini LLM           (generate program.md + experiment.py)
      └── competitions/<slug>/ (isolated working directory)
              ├── competition.txt    (slug, read by agent.py)
              ├── program.md         (LLM system prompt, competition-specific)
              ├── experiment.py      (the ML pipeline, rewritten each iteration)
              ├── submission.csv     (latest predictions)
              ├── results.tsv        (experiment history log)
              ├── run.log            (last experiment stdout/stderr)
              ├── data/              (competition dataset files)
              └── submissions/       (timestamped archive of all submissions)
                        ↓
                   agent.py --competition-dir competitions/<slug>/
                        │
                        ├── reads program.md → system prompt for every LLM call
                        ├── reads/writes experiment.py → the evolving ML pipeline
                        ├── runs experiment.py as subprocess → gets cv metric
                        ├── submits to Kaggle CLI → gets public leaderboard score
                        └── calls Gemini LLM → proposes next improvement

Repository layout (safe for GitHub)

This project is configured so you can publish the repo without leaking datasets, submissions, or secrets.

In git	Ignored (see `.gitignore`)
`agent.py`, `context_agent.py`, `llm.py`, `orchestration/`, `docs/`	`competitions/*` workspaces (entire per-competition trees)
`README.md`, `setup.sh`	Root `/data/` (downloaded CSVs), `Qwen3-8B-INT4/`, root `*.ipynb`
`competitions/README.md` only	`**/.kaggle.json` paths under repo, `.env`, `run.log`, `results.tsv`, `llm_memory.jsonl`, `submissions/`

After cloning: run context_agent.py <competition> once to create competitions/<slug>/ locally. Keep API keys in environment variables and Kaggle credentials in ~/.kaggle/kaggle.json — never commit them.

Quick Start

Prerequisites

# 1. Install dependencies
pip install google-genai pandas scikit-learn lightgbm kaggle

# 2. Set up Kaggle credentials
#    Download kaggle.json from https://www.kaggle.com/settings
mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

# 3a. Google Gemini backend (default)
#     Get a key at https://aistudio.google.com
export GOOGLE_API_KEY='<paste-from-aistudio.google.com>'

# 3b. Ollama backend (local, no API key needed)
#     Install Ollama: https://ollama.com/download
ollama serve                        # start the server
ollama pull qwen2.5-coder:14b       # pull a model (one-time)

Run on any Kaggle competition

# Default (Gemini backend)
python context_agent.py titanic

# Full URL also works
python context_agent.py https://www.kaggle.com/competitions/titanic

# Use local Ollama instead of Gemini
python context_agent.py titanic --llm ollama
python context_agent.py titanic --llm ollama --ollama-model llama3.1

# Skip Kaggle submissions (local CV only, faster iteration)
python context_agent.py titanic --skip-kaggle

# Run a fixed number of iterations
python context_agent.py titanic --max-iter 20

# Run locally, submit only the best model at the end
python context_agent.py titanic --submit-best-only --max-iter 50

# Set up without starting the agent (inspect generated files first)
python context_agent.py titanic --no-launch

# Regenerate program.md and experiment.py (e.g. after modifying prompts)
python context_agent.py titanic --force-reinit --no-launch

Run the agent directly (if competition is already set up)

python agent.py --competition-dir competitions/titanic
python agent.py --competition-dir competitions/titanic --skip-kaggle --max-iter 10
python agent.py --competition-dir competitions/titanic --llm ollama --ollama-model qwen2.5-coder:14b

Original hardcoded competition (backward compatible)

# Still works exactly as before
python agent.py
python agent.py --skip-kaggle --max-iter 5

File Reference

File	Purpose
`context_agent.py`	Bootstrap a new competition (download data, generate context, launch agent)
`agent.py`	Iterative ML agent — the main optimization loop
`program.md`	Default system prompt (for the original home-data-for-ml-course competition)
`experiment.py`	Default starter ML pipeline (for the original competition)
`setup.sh`	Manual setup script for the original competition
`results.tsv`	Experiment history for the original competition

Per-competition files (in `competitions/<slug>/`)

File	Purpose
`competition.txt`	Competition slug (read by `agent.py`)
`program.md`	LLM-generated system prompt for this competition
`experiment.py`	Evolving ML pipeline (rewritten by LLM each iteration)
`submission.csv`	Latest predictions (overwritten each iteration)
`results.tsv`	Experiment history (iteration, cv metric, kaggle score, status)
`run.log`	Stdout/stderr from the last experiment run
`data/`	Downloaded competition dataset
`submissions/`	Timestamped archive of every submission
`logs/`	Agent run logs

How the LLM Context Works

Every agent iteration, the LLM receives:

System prompt = program.md (competition rules, metric, dataset notes, improvement directions)
User prompt = last 15 iteration results + current experiment.py + a numbered menu of 8 directions to explore

The LLM responds with:

DESCRIPTION: <one sentence describing what changed>
<complete updated experiment.py>

The agent extracts the code, writes it to experiment.py, and runs it.

Experiment Output Format

Every experiment.py must print this block at the very end (the agent parses it):

---
cv_rmse:    0.12345     # for regression tasks
n_features: 85
model:      LightGBM
---

Metric key varies by task type:

cv_rmse — regression (RMSE on log-transformed target)
cv_auc — binary classification (ROC-AUC, higher is better)
cv_logloss — multi-class classification (lower is better)

Crash Recovery

If experiment.py crashes or produces no metric, the agent:

Reads the traceback from run.log
Sends it to the LLM with a fix request
Retries up to 3 times
If all retries fail: logs the crash (cv_rmse: 9.99999, status: crash) and restores the best known code

Competition Directory Layout Example

After running python context_agent.py titanic:

kaggle_agent/
├── context_agent.py
├── agent.py
├── program.md              ← original competition's prompt
├── experiment.py           ← original competition's pipeline
├── competitions/
│   └── titanic/
│       ├── competition.txt       ← "titanic"
│       ├── program.md            ← LLM-generated for Titanic
│       ├── experiment.py         ← LLM-generated starter pipeline
│       ├── submission.csv        ← latest predictions
│       ├── results.tsv           ← experiment history
│       ├── run.log               ← last run output
│       ├── data/
│       │   ├── train.csv
│       │   ├── test.csv
│       │   └── gender_submission.csv
│       ├── submissions/
│       │   └── submission_20260311_120000.csv
│       └── logs/
└── data/                   ← original competition data

LLM Backends

The agent supports two LLM backends, selectable via --llm:

Gemini (default)

Variable	Required	Description
`GOOGLE_API_KEY`	Yes	Get at https://aistudio.google.com

Models tried in order (falls back on failure):

gemini-2.5-flash — fast, cost-effective
gemini-2.5-pro — more capable fallback

Ollama (local, no API key)

Requires Ollama running locally.

ollama serve                         # start the server (runs on localhost:11434)
ollama pull qwen2.5-coder:14b        # recommended — strong at code generation
ollama pull llama3.1                 # alternative general-purpose model
ollama pull deepseek-r1:14b          # alternative reasoning model

Flags:

Flag	Default	Description
`--llm ollama`	—	Switch to Ollama backend
`--ollama-model NAME`	`qwen2.5-coder:14b`	Model to use
`--ollama-url URL`	`http://localhost:11434`	Ollama server URL

The Kaggle credentials are read from ~/.kaggle/kaggle.json (standard Kaggle CLI location).

Agent CLI Flags

`context_agent.py`

Flag	Default	Description
`competition`	required	Competition URL or slug
`--skip-kaggle`	off	Skip Kaggle submissions (local CV only)
`--max-iter N`	0 (∞)	Stop after N agent iterations
`--submit-best-only`	off	Run locally, submit best at end
`--no-launch`	off	Set up only, don't run the agent
`--force-reinit`	off	Regenerate program.md and experiment.py
`--llm`	`gemini`	LLM backend: `gemini` or `ollama`
`--ollama-model`	`qwen2.5-coder:14b`	Ollama model name
`--ollama-url`	`http://localhost:11434`	Ollama server URL

`agent.py`

Flag	Default	Description
`--competition-dir PATH`	`.` (agent's own dir)	Competition working directory
`--skip-kaggle`	off	Skip Kaggle submissions
`--max-iter N`	0 (∞)	Stop after N iterations
`--submit-best-only`	off	Run locally, submit best at end
`--llm`	`gemini`	LLM backend: `gemini` or `ollama`
`--ollama-model`	`qwen2.5-coder:14b`	Ollama model name
`--ollama-url`	`http://localhost:11434`	Ollama server URL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Kaggle ML Agent

How It Works

Layer 1 — Context Agent (`context_agent.py`)

Layer 2 — Iterative Agent (`agent.py`)

Full Architecture

Repository layout (safe for GitHub)

Quick Start

Prerequisites

Run on any Kaggle competition

Run the agent directly (if competition is already set up)

Original hardcoded competition (backward compatible)

File Reference

Per-competition files (in `competitions/<slug>/`)

How the LLM Context Works

Experiment Output Format

Crash Recovery

Competition Directory Layout Example

LLM Backends

Gemini (default)

Ollama (local, no API key)

Agent CLI Flags

`context_agent.py`

`agent.py`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
competitions		competitions
docs		docs
logs		logs
orchestration		orchestration
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
context_agent.py		context_agent.py
cv-0-91689-xgb-lgb-multi-seed-ensemble-0450f2.ipynb:Zone.Identifier		cv-0-91689-xgb-lgb-multi-seed-ensemble-0450f2.ipynb:Zone.Identifier
experiment.py		experiment.py
llm.py		llm.py
program.md		program.md
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Autonomous Kaggle ML Agent

How It Works

Layer 1 — Context Agent (context_agent.py)

Layer 2 — Iterative Agent (agent.py)

Full Architecture

Repository layout (safe for GitHub)

Quick Start

Prerequisites

Run on any Kaggle competition

Run the agent directly (if competition is already set up)

Original hardcoded competition (backward compatible)

File Reference

Per-competition files (in competitions/<slug>/)

How the LLM Context Works

Experiment Output Format

Crash Recovery

Competition Directory Layout Example

LLM Backends

Gemini (default)

Ollama (local, no API key)

Agent CLI Flags

context_agent.py

agent.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Layer 1 — Context Agent (`context_agent.py`)

Layer 2 — Iterative Agent (`agent.py`)

Per-competition files (in `competitions/<slug>/`)

`context_agent.py`

`agent.py`

Packages