A fully autonomous, competition-agnostic ML agent that takes a Kaggle competition URL, sets itself up from scratch, and iteratively improves a machine learning model using a Gemini LLM in a continuous feedback loop.
The system has two layers:
Runs once per competition to bootstrap the working environment.
- Parse slug β extracts the competition slug from a URL or accepts it directly
- Create directory β
competitions/<slug>/withdata/,submissions/,logs/ - Download data β calls
kaggle competitions downloadand unzips all files - Inspect dataset β reads all CSV/parquet files: shapes, dtypes, missing values, sample rows, description files
- Fetch metadata β title, category, reward, deadline via
kaggle competitions list - Detect target & ID columns β infers target (column in train but not test) and ID column automatically
- Generate
program.mdβ LLM writes a competition-specific system prompt (rules, metric, dataset notes, 8 improvement directions) - Generate
experiment.pyβ LLM writes a complete, runnable starter ML pipeline suited to the task type - Launch
agent.pyβ starts the iterative loop in the competition directory
Runs continuously, improving the model every iteration.
ββββββββββββββββββββ MAIN LOOP ββββββββββββββββββββ
β β
β 1. Run experiment.py as subprocess β
β ββ writes submission.csv β
β ββ prints cv_metric: X.XXXXX β
β β
β 2. Parse cv metric from output β
β β
β 3. If crash: ask LLM to fix β retry (Γ3) β
β β
β 4. Submit to Kaggle CLI (if not --skip-kaggle) β
β β
β 5. Keep if metric improved, else revert to best β
β β
β 6. Append result to results.tsv β
β β
β 7. Ask LLM for next improvement idea β
β ββ LLM rewrites experiment.py completely β
β β
β 8. Repeat forever β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
context_agent.py βββ competition URL / slug
β
βββ kaggle CLI (download dataset)
βββ Gemini LLM (generate program.md + experiment.py)
βββ competitions/<slug>/ (isolated working directory)
βββ competition.txt (slug, read by agent.py)
βββ program.md (LLM system prompt, competition-specific)
βββ experiment.py (the ML pipeline, rewritten each iteration)
βββ submission.csv (latest predictions)
βββ results.tsv (experiment history log)
βββ run.log (last experiment stdout/stderr)
βββ data/ (competition dataset files)
βββ submissions/ (timestamped archive of all submissions)
β
agent.py --competition-dir competitions/<slug>/
β
βββ reads program.md β system prompt for every LLM call
βββ reads/writes experiment.py β the evolving ML pipeline
βββ runs experiment.py as subprocess β gets cv metric
βββ submits to Kaggle CLI β gets public leaderboard score
βββ calls Gemini LLM β proposes next improvement
This project is configured so you can publish the repo without leaking datasets, submissions, or secrets.
| In git | Ignored (see .gitignore) |
|---|---|
agent.py, context_agent.py, llm.py, orchestration/, docs/ |
competitions/* workspaces (entire per-competition trees) |
README.md, setup.sh |
Root /data/ (downloaded CSVs), Qwen3-8B-INT4/, root *.ipynb |
competitions/README.md only |
**/.kaggle.json paths under repo, .env, run.log, results.tsv, llm_memory.jsonl, submissions/ |
After cloning: run context_agent.py <competition> once to create competitions/<slug>/ locally. Keep API keys in environment variables and Kaggle credentials in ~/.kaggle/kaggle.json β never commit them.
# 1. Install dependencies
pip install google-genai pandas scikit-learn lightgbm kaggle
# 2. Set up Kaggle credentials
# Download kaggle.json from https://www.kaggle.com/settings
mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
# 3a. Google Gemini backend (default)
# Get a key at https://aistudio.google.com
export GOOGLE_API_KEY='<paste-from-aistudio.google.com>'
# 3b. Ollama backend (local, no API key needed)
# Install Ollama: https://ollama.com/download
ollama serve # start the server
ollama pull qwen2.5-coder:14b # pull a model (one-time)# Default (Gemini backend)
python context_agent.py titanic
# Full URL also works
python context_agent.py https://www.kaggle.com/competitions/titanic
# Use local Ollama instead of Gemini
python context_agent.py titanic --llm ollama
python context_agent.py titanic --llm ollama --ollama-model llama3.1
# Skip Kaggle submissions (local CV only, faster iteration)
python context_agent.py titanic --skip-kaggle
# Run a fixed number of iterations
python context_agent.py titanic --max-iter 20
# Run locally, submit only the best model at the end
python context_agent.py titanic --submit-best-only --max-iter 50
# Set up without starting the agent (inspect generated files first)
python context_agent.py titanic --no-launch
# Regenerate program.md and experiment.py (e.g. after modifying prompts)
python context_agent.py titanic --force-reinit --no-launchpython agent.py --competition-dir competitions/titanic
python agent.py --competition-dir competitions/titanic --skip-kaggle --max-iter 10
python agent.py --competition-dir competitions/titanic --llm ollama --ollama-model qwen2.5-coder:14b# Still works exactly as before
python agent.py
python agent.py --skip-kaggle --max-iter 5| File | Purpose |
|---|---|
context_agent.py |
Bootstrap a new competition (download data, generate context, launch agent) |
agent.py |
Iterative ML agent β the main optimization loop |
program.md |
Default system prompt (for the original home-data-for-ml-course competition) |
experiment.py |
Default starter ML pipeline (for the original competition) |
setup.sh |
Manual setup script for the original competition |
results.tsv |
Experiment history for the original competition |
| File | Purpose |
|---|---|
competition.txt |
Competition slug (read by agent.py) |
program.md |
LLM-generated system prompt for this competition |
experiment.py |
Evolving ML pipeline (rewritten by LLM each iteration) |
submission.csv |
Latest predictions (overwritten each iteration) |
results.tsv |
Experiment history (iteration, cv metric, kaggle score, status) |
run.log |
Stdout/stderr from the last experiment run |
data/ |
Downloaded competition dataset |
submissions/ |
Timestamped archive of every submission |
logs/ |
Agent run logs |
Every agent iteration, the LLM receives:
- System prompt =
program.md(competition rules, metric, dataset notes, improvement directions) - User prompt = last 15 iteration results + current
experiment.py+ a numbered menu of 8 directions to explore
The LLM responds with:
DESCRIPTION: <one sentence describing what changed>
<complete updated experiment.py>
The agent extracts the code, writes it to experiment.py, and runs it.
Every experiment.py must print this block at the very end (the agent parses it):
---
cv_rmse: 0.12345 # for regression tasks
n_features: 85
model: LightGBM
---
Metric key varies by task type:
cv_rmseβ regression (RMSE on log-transformed target)cv_aucβ binary classification (ROC-AUC, higher is better)cv_loglossβ multi-class classification (lower is better)
If experiment.py crashes or produces no metric, the agent:
- Reads the traceback from
run.log - Sends it to the LLM with a fix request
- Retries up to 3 times
- If all retries fail: logs the crash (
cv_rmse: 9.99999, status:crash) and restores the best known code
After running python context_agent.py titanic:
kaggle_agent/
βββ context_agent.py
βββ agent.py
βββ program.md β original competition's prompt
βββ experiment.py β original competition's pipeline
βββ competitions/
β βββ titanic/
β βββ competition.txt β "titanic"
β βββ program.md β LLM-generated for Titanic
β βββ experiment.py β LLM-generated starter pipeline
β βββ submission.csv β latest predictions
β βββ results.tsv β experiment history
β βββ run.log β last run output
β βββ data/
β β βββ train.csv
β β βββ test.csv
β β βββ gender_submission.csv
β βββ submissions/
β β βββ submission_20260311_120000.csv
β βββ logs/
βββ data/ β original competition data
The agent supports two LLM backends, selectable via --llm:
| Variable | Required | Description |
|---|---|---|
GOOGLE_API_KEY |
Yes | Get at https://aistudio.google.com |
Models tried in order (falls back on failure):
gemini-2.5-flashβ fast, cost-effectivegemini-2.5-proβ more capable fallback
Requires Ollama running locally.
ollama serve # start the server (runs on localhost:11434)
ollama pull qwen2.5-coder:14b # recommended β strong at code generation
ollama pull llama3.1 # alternative general-purpose model
ollama pull deepseek-r1:14b # alternative reasoning modelFlags:
| Flag | Default | Description |
|---|---|---|
--llm ollama |
β | Switch to Ollama backend |
--ollama-model NAME |
qwen2.5-coder:14b |
Model to use |
--ollama-url URL |
http://localhost:11434 |
Ollama server URL |
The Kaggle credentials are read from ~/.kaggle/kaggle.json (standard Kaggle CLI location).
| Flag | Default | Description |
|---|---|---|
competition |
required | Competition URL or slug |
--skip-kaggle |
off | Skip Kaggle submissions (local CV only) |
--max-iter N |
0 (β) | Stop after N agent iterations |
--submit-best-only |
off | Run locally, submit best at end |
--no-launch |
off | Set up only, don't run the agent |
--force-reinit |
off | Regenerate program.md and experiment.py |
--llm |
gemini |
LLM backend: gemini or ollama |
--ollama-model |
qwen2.5-coder:14b |
Ollama model name |
--ollama-url |
http://localhost:11434 |
Ollama server URL |
| Flag | Default | Description |
|---|---|---|
--competition-dir PATH |
. (agent's own dir) |
Competition working directory |
--skip-kaggle |
off | Skip Kaggle submissions |
--max-iter N |
0 (β) | Stop after N iterations |
--submit-best-only |
off | Run locally, submit best at end |
--llm |
gemini |
LLM backend: gemini or ollama |
--ollama-model |
qwen2.5-coder:14b |
Ollama model name |
--ollama-url |
http://localhost:11434 |
Ollama server URL |