Skip to content

SanskaarUndale21/CleanRL-openenv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

title CleanRL OpenEnv
emoji 🧹
colorFrom blue
colorTo green
sdk docker
app_port 7860
pinned false
license mit
short_description Real-world data cleaning RL environment with 5 tasks

CleanRL β€” Real-World Data Cleaning RL Environment

Overview

CleanRL is a production-ready OpenEnv-compliant reinforcement learning environment that simulates real-world data cleaning workflows. An AI agent must identify and fix problems in messy datasets spanning multiple data types: tabular records, text document metadata, and mixed media metadata (images, audio, video). The agent interacts through structured JSON actions, receives scalar reward signals after each action, and is scored against a ground-truth clean dataset at episode end.

A live interactive dashboard is served at http://localhost:7860 β€” open it in your browser to step through episodes manually, trigger heuristic auto-steps, and view the dataset, issues, and reward history in real time.

Motivation

Data cleaning is one of the most time-consuming steps in any ML pipeline β€” estimates suggest data scientists spend 60–80% of their time on it. Yet it requires nuanced judgment: when to impute vs. drop, how to normalize inconsistent formats, what counts as an invalid value. CleanRL frames this as a sequential decision-making problem, enabling researchers to train and evaluate agents that learn generalizable data-cleaning policies across varied dataset types.

Tasks

Task ID Difficulty Data Type Description
basic_tabular_cleaning Easy Tabular Customer records with missing names/emails, invalid ages, and string-typed numeric fields. Max 20 steps.
structured_text_cleaning Medium Text Document metadata with duplicates, inconsistent category/language casing, invalid word counts, and null fields. Max 25 steps.
realworld_multimodal_cleaning Hard Mixed (image/audio/video) Mixed media metadata with invalid resolutions, broken URLs, out-of-range sample rates, negative durations, and corrupt values. Max 30 steps.

Action Space

All actions are submitted as structured JSON objects to POST /step.

Action Type Required Fields Description
fill_missing target_field, value Replace None/empty values with a default. Optionally scoped to target_record_id.
drop_record target_record_id Remove a single record by ID. Refused if it would delete >40% of the dataset.
convert_type target_field, value ("int", "float", "str", "bool") Cast all values in a field to the specified type. Failures default to 0 / 0.0 / False.
remove_duplicates (none) Remove duplicate records. Strategy depends on data type (exact match / case-insensitive title / filename+media_type).
normalize_text target_field Apply .strip().lower() to all string values in the field.
fix_invalid target_field, condition, value Set target_field to value for all records matching condition (e.g. "age < 0").
standardize_format target_field, value ("lowercase", "uppercase", "url_fix", "unknown_default") Bulk-transform a field's format. url_fix prepends https:// to non-URL strings; unknown_default replaces nulls with "unknown".
no_op (none) Take no action. Returns 0 reward.

Observation Space

Each step returns a structured dict with the following fields:

Field Type Description
task_id string Active task identifier
data_type string "tabular", "text", or "mixed"
dataset_preview list[dict] First 5 records of the current dataset
schema_info dict[str, str] Field β†’ Python type name mapping (from first record)
issues list[str] All currently detected issues in structured format
step_count int Current step number
max_steps int Maximum steps allowed for this task
score_so_far float Current grader score (0.0–1.0)
total_records int Total number of records in the dataset
task_description string Human-readable task goal

Issue String Format

missing_value:record={id}:field={field}
empty_string:record={id}:field={field}
invalid_type:record={id}:field={field}:found={type}:expected={type}
invalid_value:record={id}:field={field}:value={value}
inconsistent_format:record={id}:field={field}:value={value}
duplicate_record:record={id}:matches={other_id}
broken_url:record={id}:field={field}

Reward Function

Event Reward
Fixed 3+ issues in one action +0.20
Fixed 1–2 issues in one action +0.10
Dataset changed but no issues reduced +0.05
No change (action had no effect) 0.00
no_op action 0.00
Action introduced new issues -0.20
Same action repeated 3+ times in a row -0.10
Excessive record deletion (>40%) -0.30

Efficiency Bonus (added at episode end)

Condition Bonus
Score β‰₯ 0.95 AND steps ≀ 50% of max +0.30
Score β‰₯ 0.95 AND steps ≀ 75% of max +0.15
Score β‰₯ 0.80 +0.05

Grading Formula

score = cell_accuracy Γ— 0.7 + schema_score Γ— 0.3
  • cell_accuracy: For each field in each record present in both current and clean datasets, compare values after str().strip().lower() normalization. If dataset lengths differ, multiply by min_len / max_len.
  • schema_score: For each field that should be a number (int/float) in the clean dataset, check whether the current dataset has it as a number. Score = matching numeric fields / total numeric fields expected.

Score is capped to [0.0, 1.0] and rounded to 4 decimal places.

Setup

Environment Variables

Create a .env file in the cleanrl-env/ directory (one is already provided):

API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=GPT-4o mini
HF_TOKEN=
API_KEY=sk-proj-your_openai_api_key_here

Note: API_KEY is your OpenAI API key used to authenticate inference.py. HF_TOKEN is reserved for Hugging Face-compatible endpoints and can be left empty when using OpenAI directly.

Docker (recommended)

# Build
docker build -t cleanrl-env .

# Run
docker run -p 7860:7860 cleanrl-env

Python (local)

pip install -r requirements.txt
python server.py

The server starts on http://localhost:7860. Open that URL in your browser to see the interactive dashboard.

API Endpoints

Method Path Description
GET / Interactive dashboard UI
GET /health Health check
GET /tasks List all available tasks
GET /state Full current environment state
POST /reset?task_id=... Start a new episode
POST /step Submit a manual action, receive obs/reward/done
POST /auto Heuristic auto-step (no LLM required)
GET /docs Auto-generated Swagger UI (FastAPI)

Example: Reset and Step

# Start a new episode
curl -X POST "http://localhost:7860/reset?task_id=basic_tabular_cleaning"

# Submit an action
curl -X POST "http://localhost:7860/step" \
  -H "Content-Type: application/json" \
  -d '{"action_type": "fill_missing", "target_field": "name", "value": "Unknown"}'

# Heuristic auto-step (picks best action without LLM)
curl -X POST "http://localhost:7860/auto"

Running inference.py (AI Agent)

inference.py uses GPT-4o mini (via the OpenAI API) to interact with CleanRL and emits structured logs to stdout.

# 1. Make sure the server is running
python server.py

# 2. In a separate terminal, run the AI agent
python inference.py

The agent reads credentials from the .env file automatically β€” make sure API_KEY is set.

Stdout Log Format

[START] task=basic_tabular_cleaning env=cleanrl model=GPT-4o mini
[STEP] step=1 action={"action_type": "fill_missing", ...} reward=0.10 done=false error=null
[STEP] step=2 action={"action_type": "convert_type", ...} reward=0.20 done=false error=null
...
[END] success=true steps=8 rewards=0.10,0.20,0.10,...

Baseline Scores

Task Baseline Score Steps Used Model
basic_tabular_cleaning TBD TBD GPT-4o mini
structured_text_cleaning TBD TBD GPT-4o mini
realworld_multimodal_cleaning TBD TBD GPT-4o mini

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors