CleanRL — Real-World Data Cleaning RL Environment

title	CleanRL OpenEnv
emoji	🧹
colorFrom	blue
colorTo	green
sdk	docker
app_port	7860
pinned	false
license	mit
short_description	Real-world data cleaning RL environment with 5 tasks

CleanRL — Real-World Data Cleaning RL Environment

Overview

CleanRL is a production-ready OpenEnv-compliant reinforcement learning environment that simulates real-world data cleaning workflows. An AI agent must identify and fix problems in messy datasets spanning multiple data types: tabular records, text document metadata, and mixed media metadata (images, audio, video). The agent interacts through structured JSON actions, receives scalar reward signals after each action, and is scored against a ground-truth clean dataset at episode end.

A live interactive dashboard is served at http://localhost:7860 — open it in your browser to step through episodes manually, trigger heuristic auto-steps, and view the dataset, issues, and reward history in real time.

Motivation

Data cleaning is one of the most time-consuming steps in any ML pipeline — estimates suggest data scientists spend 60–80% of their time on it. Yet it requires nuanced judgment: when to impute vs. drop, how to normalize inconsistent formats, what counts as an invalid value. CleanRL frames this as a sequential decision-making problem, enabling researchers to train and evaluate agents that learn generalizable data-cleaning policies across varied dataset types.

Tasks

Task ID	Difficulty	Data Type	Description
`basic_tabular_cleaning`	Easy	Tabular	Customer records with missing names/emails, invalid ages, and string-typed numeric fields. Max 20 steps.
`structured_text_cleaning`	Medium	Text	Document metadata with duplicates, inconsistent category/language casing, invalid word counts, and null fields. Max 25 steps.
`realworld_multimodal_cleaning`	Hard	Mixed (image/audio/video)	Mixed media metadata with invalid resolutions, broken URLs, out-of-range sample rates, negative durations, and corrupt values. Max 30 steps.

Action Space

All actions are submitted as structured JSON objects to POST /step.

Action Type	Required Fields	Description
`fill_missing`	`target_field`, `value`	Replace None/empty values with a default. Optionally scoped to `target_record_id`.
`drop_record`	`target_record_id`	Remove a single record by ID. Refused if it would delete >40% of the dataset.
`convert_type`	`target_field`, `value` (`"int"`, `"float"`, `"str"`, `"bool"`)	Cast all values in a field to the specified type. Failures default to 0 / 0.0 / False.
`remove_duplicates`	(none)	Remove duplicate records. Strategy depends on data type (exact match / case-insensitive title / filename+media_type).
`normalize_text`	`target_field`	Apply `.strip().lower()` to all string values in the field.
`fix_invalid`	`target_field`, `condition`, `value`	Set `target_field` to `value` for all records matching `condition` (e.g. `"age < 0"`).
`standardize_format`	`target_field`, `value` (`"lowercase"`, `"uppercase"`, `"url_fix"`, `"unknown_default"`)	Bulk-transform a field's format. `url_fix` prepends `https://` to non-URL strings; `unknown_default` replaces nulls with `"unknown"`.
`no_op`	(none)	Take no action. Returns 0 reward.

Observation Space

Each step returns a structured dict with the following fields:

Field	Type	Description
`task_id`	string	Active task identifier
`data_type`	string	`"tabular"`, `"text"`, or `"mixed"`
`dataset_preview`	list[dict]	First 5 records of the current dataset
`schema_info`	dict[str, str]	Field → Python type name mapping (from first record)
`issues`	list[str]	All currently detected issues in structured format
`step_count`	int	Current step number
`max_steps`	int	Maximum steps allowed for this task
`score_so_far`	float	Current grader score (0.0–1.0)
`total_records`	int	Total number of records in the dataset
`task_description`	string	Human-readable task goal

Issue String Format

missing_value:record={id}:field={field}
empty_string:record={id}:field={field}
invalid_type:record={id}:field={field}:found={type}:expected={type}
invalid_value:record={id}:field={field}:value={value}
inconsistent_format:record={id}:field={field}:value={value}
duplicate_record:record={id}:matches={other_id}
broken_url:record={id}:field={field}

Reward Function

Event	Reward
Fixed 3+ issues in one action	+0.20
Fixed 1–2 issues in one action	+0.10
Dataset changed but no issues reduced	+0.05
No change (action had no effect)	0.00
`no_op` action	0.00
Action introduced new issues	-0.20
Same action repeated 3+ times in a row	-0.10
Excessive record deletion (>40%)	-0.30

Efficiency Bonus (added at episode end)

Condition	Bonus
Score ≥ 0.95 AND steps ≤ 50% of max	+0.30
Score ≥ 0.95 AND steps ≤ 75% of max	+0.15
Score ≥ 0.80	+0.05

Grading Formula

score = cell_accuracy × 0.7 + schema_score × 0.3

cell_accuracy: For each field in each record present in both current and clean datasets, compare values after str().strip().lower() normalization. If dataset lengths differ, multiply by min_len / max_len.
schema_score: For each field that should be a number (int/float) in the clean dataset, check whether the current dataset has it as a number. Score = matching numeric fields / total numeric fields expected.

Score is capped to [0.0, 1.0] and rounded to 4 decimal places.

Setup

Environment Variables

Create a .env file in the cleanrl-env/ directory (one is already provided):

API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=GPT-4o mini
HF_TOKEN=
API_KEY=sk-proj-your_openai_api_key_here

Note: API_KEY is your OpenAI API key used to authenticate inference.py. HF_TOKEN is reserved for Hugging Face-compatible endpoints and can be left empty when using OpenAI directly.

Docker (recommended)

# Build
docker build -t cleanrl-env .

# Run
docker run -p 7860:7860 cleanrl-env

Python (local)

pip install -r requirements.txt
python server.py

The server starts on http://localhost:7860. Open that URL in your browser to see the interactive dashboard.

API Endpoints

Method	Path	Description
GET	`/`	Interactive dashboard UI
GET	`/health`	Health check
GET	`/tasks`	List all available tasks
GET	`/state`	Full current environment state
POST	`/reset?task_id=...`	Start a new episode
POST	`/step`	Submit a manual action, receive obs/reward/done
POST	`/auto`	Heuristic auto-step (no LLM required)
GET	`/docs`	Auto-generated Swagger UI (FastAPI)

Example: Reset and Step

# Start a new episode
curl -X POST "http://localhost:7860/reset?task_id=basic_tabular_cleaning"

# Submit an action
curl -X POST "http://localhost:7860/step" \
  -H "Content-Type: application/json" \
  -d '{"action_type": "fill_missing", "target_field": "name", "value": "Unknown"}'

# Heuristic auto-step (picks best action without LLM)
curl -X POST "http://localhost:7860/auto"

Running inference.py (AI Agent)

inference.py uses GPT-4o mini (via the OpenAI API) to interact with CleanRL and emits structured logs to stdout.

# 1. Make sure the server is running
python server.py

# 2. In a separate terminal, run the AI agent
python inference.py

The agent reads credentials from the .env file automatically — make sure API_KEY is set.

Stdout Log Format

[START] task=basic_tabular_cleaning env=cleanrl model=GPT-4o mini
[STEP] step=1 action={"action_type": "fill_missing", ...} reward=0.10 done=false error=null
[STEP] step=2 action={"action_type": "convert_type", ...} reward=0.20 done=false error=null
...
[END] success=true steps=8 rewards=0.10,0.20,0.10,...

Baseline Scores

Task	Baseline Score	Steps Used	Model
`basic_tabular_cleaning`	TBD	TBD	GPT-4o mini
`structured_text_cleaning`	TBD	TBD	GPT-4o mini
`realworld_multimodal_cleaning`	TBD	TBD	GPT-4o mini

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
env		env
server		server
static		static
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
client.py		client.py
inference.py		inference.py
models.py		models.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
uv.lock		uv.lock
validation.sh		validation.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CleanRL — Real-World Data Cleaning RL Environment

Overview

Motivation

Tasks

Action Space

Observation Space

Issue String Format

Reward Function

Efficiency Bonus (added at episode end)

Grading Formula

Setup

Environment Variables

Docker (recommended)

Python (local)

API Endpoints

Example: Reset and Step

Running inference.py (AI Agent)

Stdout Log Format

Baseline Scores

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CleanRL — Real-World Data Cleaning RL Environment

Overview

Motivation

Tasks

Action Space

Observation Space

Issue String Format

Reward Function

Efficiency Bonus (added at episode end)

Grading Formula

Setup

Environment Variables

Docker (recommended)

Python (local)

API Endpoints

Example: Reset and Step

Running inference.py (AI Agent)

Stdout Log Format

Baseline Scores

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages