Skip to content

Denizdius/kaggle_agent

Repository files navigation

Autonomous Kaggle ML Agent

A fully autonomous, competition-agnostic ML agent that takes a Kaggle competition URL, sets itself up from scratch, and iteratively improves a machine learning model using a Gemini LLM in a continuous feedback loop.


How It Works

The system has two layers:

Layer 1 β€” Context Agent (context_agent.py)

Runs once per competition to bootstrap the working environment.

  1. Parse slug β€” extracts the competition slug from a URL or accepts it directly
  2. Create directory β€” competitions/<slug>/ with data/, submissions/, logs/
  3. Download data β€” calls kaggle competitions download and unzips all files
  4. Inspect dataset β€” reads all CSV/parquet files: shapes, dtypes, missing values, sample rows, description files
  5. Fetch metadata β€” title, category, reward, deadline via kaggle competitions list
  6. Detect target & ID columns β€” infers target (column in train but not test) and ID column automatically
  7. Generate program.md β€” LLM writes a competition-specific system prompt (rules, metric, dataset notes, 8 improvement directions)
  8. Generate experiment.py β€” LLM writes a complete, runnable starter ML pipeline suited to the task type
  9. Launch agent.py β€” starts the iterative loop in the competition directory

Layer 2 β€” Iterative Agent (agent.py)

Runs continuously, improving the model every iteration.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ MAIN LOOP ───────────────────┐
β”‚                                                  β”‚
β”‚  1. Run experiment.py as subprocess              β”‚
β”‚     └─ writes submission.csv                     β”‚
β”‚     └─ prints cv_metric: X.XXXXX                 β”‚
β”‚                                                  β”‚
β”‚  2. Parse cv metric from output                  β”‚
β”‚                                                  β”‚
β”‚  3. If crash: ask LLM to fix β†’ retry (Γ—3)        β”‚
β”‚                                                  β”‚
β”‚  4. Submit to Kaggle CLI (if not --skip-kaggle)  β”‚
β”‚                                                  β”‚
β”‚  5. Keep if metric improved, else revert to best β”‚
β”‚                                                  β”‚
β”‚  6. Append result to results.tsv                 β”‚
β”‚                                                  β”‚
β”‚  7. Ask LLM for next improvement idea            β”‚
β”‚     └─ LLM rewrites experiment.py completely     β”‚
β”‚                                                  β”‚
β”‚  8. Repeat forever                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Full Architecture

context_agent.py  ←── competition URL / slug
      β”‚
      β”œβ”€β”€ kaggle CLI           (download dataset)
      β”œβ”€β”€ Gemini LLM           (generate program.md + experiment.py)
      └── competitions/<slug>/ (isolated working directory)
              β”œβ”€β”€ competition.txt    (slug, read by agent.py)
              β”œβ”€β”€ program.md         (LLM system prompt, competition-specific)
              β”œβ”€β”€ experiment.py      (the ML pipeline, rewritten each iteration)
              β”œβ”€β”€ submission.csv     (latest predictions)
              β”œβ”€β”€ results.tsv        (experiment history log)
              β”œβ”€β”€ run.log            (last experiment stdout/stderr)
              β”œβ”€β”€ data/              (competition dataset files)
              └── submissions/       (timestamped archive of all submissions)
                        ↓
                   agent.py --competition-dir competitions/<slug>/
                        β”‚
                        β”œβ”€β”€ reads program.md β†’ system prompt for every LLM call
                        β”œβ”€β”€ reads/writes experiment.py β†’ the evolving ML pipeline
                        β”œβ”€β”€ runs experiment.py as subprocess β†’ gets cv metric
                        β”œβ”€β”€ submits to Kaggle CLI β†’ gets public leaderboard score
                        └── calls Gemini LLM β†’ proposes next improvement

Repository layout (safe for GitHub)

This project is configured so you can publish the repo without leaking datasets, submissions, or secrets.

In git Ignored (see .gitignore)
agent.py, context_agent.py, llm.py, orchestration/, docs/ competitions/* workspaces (entire per-competition trees)
README.md, setup.sh Root /data/ (downloaded CSVs), Qwen3-8B-INT4/, root *.ipynb
competitions/README.md only **/.kaggle.json paths under repo, .env, run.log, results.tsv, llm_memory.jsonl, submissions/

After cloning: run context_agent.py <competition> once to create competitions/<slug>/ locally. Keep API keys in environment variables and Kaggle credentials in ~/.kaggle/kaggle.json β€” never commit them.


Quick Start

Prerequisites

# 1. Install dependencies
pip install google-genai pandas scikit-learn lightgbm kaggle

# 2. Set up Kaggle credentials
#    Download kaggle.json from https://www.kaggle.com/settings
mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

# 3a. Google Gemini backend (default)
#     Get a key at https://aistudio.google.com
export GOOGLE_API_KEY='<paste-from-aistudio.google.com>'

# 3b. Ollama backend (local, no API key needed)
#     Install Ollama: https://ollama.com/download
ollama serve                        # start the server
ollama pull qwen2.5-coder:14b       # pull a model (one-time)

Run on any Kaggle competition

# Default (Gemini backend)
python context_agent.py titanic

# Full URL also works
python context_agent.py https://www.kaggle.com/competitions/titanic

# Use local Ollama instead of Gemini
python context_agent.py titanic --llm ollama
python context_agent.py titanic --llm ollama --ollama-model llama3.1

# Skip Kaggle submissions (local CV only, faster iteration)
python context_agent.py titanic --skip-kaggle

# Run a fixed number of iterations
python context_agent.py titanic --max-iter 20

# Run locally, submit only the best model at the end
python context_agent.py titanic --submit-best-only --max-iter 50

# Set up without starting the agent (inspect generated files first)
python context_agent.py titanic --no-launch

# Regenerate program.md and experiment.py (e.g. after modifying prompts)
python context_agent.py titanic --force-reinit --no-launch

Run the agent directly (if competition is already set up)

python agent.py --competition-dir competitions/titanic
python agent.py --competition-dir competitions/titanic --skip-kaggle --max-iter 10
python agent.py --competition-dir competitions/titanic --llm ollama --ollama-model qwen2.5-coder:14b

Original hardcoded competition (backward compatible)

# Still works exactly as before
python agent.py
python agent.py --skip-kaggle --max-iter 5

File Reference

File Purpose
context_agent.py Bootstrap a new competition (download data, generate context, launch agent)
agent.py Iterative ML agent β€” the main optimization loop
program.md Default system prompt (for the original home-data-for-ml-course competition)
experiment.py Default starter ML pipeline (for the original competition)
setup.sh Manual setup script for the original competition
results.tsv Experiment history for the original competition

Per-competition files (in competitions/<slug>/)

File Purpose
competition.txt Competition slug (read by agent.py)
program.md LLM-generated system prompt for this competition
experiment.py Evolving ML pipeline (rewritten by LLM each iteration)
submission.csv Latest predictions (overwritten each iteration)
results.tsv Experiment history (iteration, cv metric, kaggle score, status)
run.log Stdout/stderr from the last experiment run
data/ Downloaded competition dataset
submissions/ Timestamped archive of every submission
logs/ Agent run logs

How the LLM Context Works

Every agent iteration, the LLM receives:

  • System prompt = program.md (competition rules, metric, dataset notes, improvement directions)
  • User prompt = last 15 iteration results + current experiment.py + a numbered menu of 8 directions to explore

The LLM responds with:

DESCRIPTION: <one sentence describing what changed>
<complete updated experiment.py>

The agent extracts the code, writes it to experiment.py, and runs it.


Experiment Output Format

Every experiment.py must print this block at the very end (the agent parses it):

---
cv_rmse:    0.12345     # for regression tasks
n_features: 85
model:      LightGBM
---

Metric key varies by task type:

  • cv_rmse β€” regression (RMSE on log-transformed target)
  • cv_auc β€” binary classification (ROC-AUC, higher is better)
  • cv_logloss β€” multi-class classification (lower is better)

Crash Recovery

If experiment.py crashes or produces no metric, the agent:

  1. Reads the traceback from run.log
  2. Sends it to the LLM with a fix request
  3. Retries up to 3 times
  4. If all retries fail: logs the crash (cv_rmse: 9.99999, status: crash) and restores the best known code

Competition Directory Layout Example

After running python context_agent.py titanic:

kaggle_agent/
β”œβ”€β”€ context_agent.py
β”œβ”€β”€ agent.py
β”œβ”€β”€ program.md              ← original competition's prompt
β”œβ”€β”€ experiment.py           ← original competition's pipeline
β”œβ”€β”€ competitions/
β”‚   └── titanic/
β”‚       β”œβ”€β”€ competition.txt       ← "titanic"
β”‚       β”œβ”€β”€ program.md            ← LLM-generated for Titanic
β”‚       β”œβ”€β”€ experiment.py         ← LLM-generated starter pipeline
β”‚       β”œβ”€β”€ submission.csv        ← latest predictions
β”‚       β”œβ”€β”€ results.tsv           ← experiment history
β”‚       β”œβ”€β”€ run.log               ← last run output
β”‚       β”œβ”€β”€ data/
β”‚       β”‚   β”œβ”€β”€ train.csv
β”‚       β”‚   β”œβ”€β”€ test.csv
β”‚       β”‚   └── gender_submission.csv
β”‚       β”œβ”€β”€ submissions/
β”‚       β”‚   └── submission_20260311_120000.csv
β”‚       └── logs/
└── data/                   ← original competition data

LLM Backends

The agent supports two LLM backends, selectable via --llm:

Gemini (default)

Variable Required Description
GOOGLE_API_KEY Yes Get at https://aistudio.google.com

Models tried in order (falls back on failure):

  1. gemini-2.5-flash β€” fast, cost-effective
  2. gemini-2.5-pro β€” more capable fallback

Ollama (local, no API key)

Requires Ollama running locally.

ollama serve                         # start the server (runs on localhost:11434)
ollama pull qwen2.5-coder:14b        # recommended β€” strong at code generation
ollama pull llama3.1                 # alternative general-purpose model
ollama pull deepseek-r1:14b          # alternative reasoning model

Flags:

Flag Default Description
--llm ollama β€” Switch to Ollama backend
--ollama-model NAME qwen2.5-coder:14b Model to use
--ollama-url URL http://localhost:11434 Ollama server URL

The Kaggle credentials are read from ~/.kaggle/kaggle.json (standard Kaggle CLI location).


Agent CLI Flags

context_agent.py

Flag Default Description
competition required Competition URL or slug
--skip-kaggle off Skip Kaggle submissions (local CV only)
--max-iter N 0 (∞) Stop after N agent iterations
--submit-best-only off Run locally, submit best at end
--no-launch off Set up only, don't run the agent
--force-reinit off Regenerate program.md and experiment.py
--llm gemini LLM backend: gemini or ollama
--ollama-model qwen2.5-coder:14b Ollama model name
--ollama-url http://localhost:11434 Ollama server URL

agent.py

Flag Default Description
--competition-dir PATH . (agent's own dir) Competition working directory
--skip-kaggle off Skip Kaggle submissions
--max-iter N 0 (∞) Stop after N iterations
--submit-best-only off Run locally, submit best at end
--llm gemini LLM backend: gemini or ollama
--ollama-model qwen2.5-coder:14b Ollama model name
--ollama-url http://localhost:11434 Ollama server URL

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors