| title | CleanRL OpenEnv |
|---|---|
| emoji | π§Ή |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
| license | mit |
| short_description | Real-world data cleaning RL environment with 5 tasks |
CleanRL is a production-ready OpenEnv-compliant reinforcement learning environment that simulates real-world data cleaning workflows. An AI agent must identify and fix problems in messy datasets spanning multiple data types: tabular records, text document metadata, and mixed media metadata (images, audio, video). The agent interacts through structured JSON actions, receives scalar reward signals after each action, and is scored against a ground-truth clean dataset at episode end.
A live interactive dashboard is served at http://localhost:7860 β open it in your browser to step through episodes manually, trigger heuristic auto-steps, and view the dataset, issues, and reward history in real time.
Data cleaning is one of the most time-consuming steps in any ML pipeline β estimates suggest data scientists spend 60β80% of their time on it. Yet it requires nuanced judgment: when to impute vs. drop, how to normalize inconsistent formats, what counts as an invalid value. CleanRL frames this as a sequential decision-making problem, enabling researchers to train and evaluate agents that learn generalizable data-cleaning policies across varied dataset types.
| Task ID | Difficulty | Data Type | Description |
|---|---|---|---|
basic_tabular_cleaning |
Easy | Tabular | Customer records with missing names/emails, invalid ages, and string-typed numeric fields. Max 20 steps. |
structured_text_cleaning |
Medium | Text | Document metadata with duplicates, inconsistent category/language casing, invalid word counts, and null fields. Max 25 steps. |
realworld_multimodal_cleaning |
Hard | Mixed (image/audio/video) | Mixed media metadata with invalid resolutions, broken URLs, out-of-range sample rates, negative durations, and corrupt values. Max 30 steps. |
All actions are submitted as structured JSON objects to POST /step.
| Action Type | Required Fields | Description |
|---|---|---|
fill_missing |
target_field, value |
Replace None/empty values with a default. Optionally scoped to target_record_id. |
drop_record |
target_record_id |
Remove a single record by ID. Refused if it would delete >40% of the dataset. |
convert_type |
target_field, value ("int", "float", "str", "bool") |
Cast all values in a field to the specified type. Failures default to 0 / 0.0 / False. |
remove_duplicates |
(none) | Remove duplicate records. Strategy depends on data type (exact match / case-insensitive title / filename+media_type). |
normalize_text |
target_field |
Apply .strip().lower() to all string values in the field. |
fix_invalid |
target_field, condition, value |
Set target_field to value for all records matching condition (e.g. "age < 0"). |
standardize_format |
target_field, value ("lowercase", "uppercase", "url_fix", "unknown_default") |
Bulk-transform a field's format. url_fix prepends https:// to non-URL strings; unknown_default replaces nulls with "unknown". |
no_op |
(none) | Take no action. Returns 0 reward. |
Each step returns a structured dict with the following fields:
| Field | Type | Description |
|---|---|---|
task_id |
string | Active task identifier |
data_type |
string | "tabular", "text", or "mixed" |
dataset_preview |
list[dict] | First 5 records of the current dataset |
schema_info |
dict[str, str] | Field β Python type name mapping (from first record) |
issues |
list[str] | All currently detected issues in structured format |
step_count |
int | Current step number |
max_steps |
int | Maximum steps allowed for this task |
score_so_far |
float | Current grader score (0.0β1.0) |
total_records |
int | Total number of records in the dataset |
task_description |
string | Human-readable task goal |
missing_value:record={id}:field={field}
empty_string:record={id}:field={field}
invalid_type:record={id}:field={field}:found={type}:expected={type}
invalid_value:record={id}:field={field}:value={value}
inconsistent_format:record={id}:field={field}:value={value}
duplicate_record:record={id}:matches={other_id}
broken_url:record={id}:field={field}
| Event | Reward |
|---|---|
| Fixed 3+ issues in one action | +0.20 |
| Fixed 1β2 issues in one action | +0.10 |
| Dataset changed but no issues reduced | +0.05 |
| No change (action had no effect) | 0.00 |
no_op action |
0.00 |
| Action introduced new issues | -0.20 |
| Same action repeated 3+ times in a row | -0.10 |
| Excessive record deletion (>40%) | -0.30 |
| Condition | Bonus |
|---|---|
| Score β₯ 0.95 AND steps β€ 50% of max | +0.30 |
| Score β₯ 0.95 AND steps β€ 75% of max | +0.15 |
| Score β₯ 0.80 | +0.05 |
score = cell_accuracy Γ 0.7 + schema_score Γ 0.3
- cell_accuracy: For each field in each record present in both current and clean datasets, compare values after
str().strip().lower()normalization. If dataset lengths differ, multiply bymin_len / max_len. - schema_score: For each field that should be a number (
int/float) in the clean dataset, check whether the current dataset has it as a number. Score = matching numeric fields / total numeric fields expected.
Score is capped to [0.0, 1.0] and rounded to 4 decimal places.
Create a .env file in the cleanrl-env/ directory (one is already provided):
API_BASE_URL=https://api.openai.com/v1
MODEL_NAME=GPT-4o mini
HF_TOKEN=
API_KEY=sk-proj-your_openai_api_key_hereNote:
API_KEYis your OpenAI API key used to authenticateinference.py.HF_TOKENis reserved for Hugging Face-compatible endpoints and can be left empty when using OpenAI directly.
# Build
docker build -t cleanrl-env .
# Run
docker run -p 7860:7860 cleanrl-envpip install -r requirements.txt
python server.pyThe server starts on http://localhost:7860.
Open that URL in your browser to see the interactive dashboard.
| Method | Path | Description |
|---|---|---|
| GET | / |
Interactive dashboard UI |
| GET | /health |
Health check |
| GET | /tasks |
List all available tasks |
| GET | /state |
Full current environment state |
| POST | /reset?task_id=... |
Start a new episode |
| POST | /step |
Submit a manual action, receive obs/reward/done |
| POST | /auto |
Heuristic auto-step (no LLM required) |
| GET | /docs |
Auto-generated Swagger UI (FastAPI) |
# Start a new episode
curl -X POST "http://localhost:7860/reset?task_id=basic_tabular_cleaning"
# Submit an action
curl -X POST "http://localhost:7860/step" \
-H "Content-Type: application/json" \
-d '{"action_type": "fill_missing", "target_field": "name", "value": "Unknown"}'
# Heuristic auto-step (picks best action without LLM)
curl -X POST "http://localhost:7860/auto"inference.py uses GPT-4o mini (via the OpenAI API) to interact with CleanRL and emits structured logs to stdout.
# 1. Make sure the server is running
python server.py
# 2. In a separate terminal, run the AI agent
python inference.pyThe agent reads credentials from the .env file automatically β make sure API_KEY is set.
[START] task=basic_tabular_cleaning env=cleanrl model=GPT-4o mini
[STEP] step=1 action={"action_type": "fill_missing", ...} reward=0.10 done=false error=null
[STEP] step=2 action={"action_type": "convert_type", ...} reward=0.20 done=false error=null
...
[END] success=true steps=8 rewards=0.10,0.20,0.10,...
| Task | Baseline Score | Steps Used | Model |
|---|---|---|---|
basic_tabular_cleaning |
TBD | TBD | GPT-4o mini |
structured_text_cleaning |
TBD | TBD | GPT-4o mini |
realworld_multimodal_cleaning |
TBD | TBD | GPT-4o mini |
MIT