Context
The paired benchmark (agent vs baseline) requires reproducible initial conditions via named save files. Each scenario needs a RimWorld save at a specific colony state. rle_crashlanded_v1 exists — 5 more needed.
Save Files to Create
All saves start from a Crashlanded scenario, Cassandra Classic, Adventure Story difficulty. Use dev mode to advance time and trigger events.
1. rle_crashlanded_v1 — DONE
- Day 1, 3 colonists, default Crashlanded start
- Already created and tested (paired benchmark delta: -0.029)
2. rle_first_winter_v1
- Advance to day 30 (approaching fall/winter)
- Colony should have basic shelter, some food stored, a few research projects done
- Dev mode:
Development > Date > Set day or just fast-forward
- Save when the season is about to change
3. rle_toxic_fallout_v1
- Advance to day 10, stable colony
- Trigger toxic fallout: Dev mode >
Debug actions > Incidents > Execute incident > ToxicFallout
- Save immediately after the fallout starts (green overlay visible)
- Tests: can agents keep colonists indoors, manage food, survive the event?
4. rle_raid_defense_v1
- Advance to day 15, build some walls/sandbags
- Trigger a raid: Dev mode >
Debug actions > Incidents > Execute incident > RaidEnemy
- Save right before or as the raid spawns
- Tests: can DefenseCommander draft colonists and position them?
5. rle_plague_response_v1
- Advance to day 10, have some medicine stockpiled
- Trigger plague: Dev mode >
Debug actions > Incidents > Execute incident > Plague
- Save immediately after plague hits (colonists should show "plague" hediff)
- Tests: can MedicalOfficer triage, assign bed rest, administer medicine?
6. rle_ship_launch_v1
- Advance to day 60 with significant research progress
- Complete several research projects via dev mode:
Debug actions > Research > Finish project
- Have 5+ colonists (use
Debug actions > Spawn pawn > Colonist)
- Save with a mid-game colony that has resources + tech to attempt ship building
- Tests: long-horizon planning, research prioritization, resource management at scale
How to Create Each Save
- Load
rle_crashlanded_v1 (base save)
- Enable dev mode: Options > check "Development mode"
- Use dev tools to advance time / trigger events per scenario above
- Save as the exact name listed (e.g.
rle_first_winter_v1)
- Verify:
curl http://localhost:8765/api/v1/game/state shows expected tick/colonist count
How to Verify Saves Work with Benchmark
# Test save/load roundtrip
python -c "
import asyncio
from rle.rimapi.client import RimAPIClient
async def test():
async with RimAPIClient('http://localhost:8765') as c:
await c.load_game('rle_first_winter_v1')
import time; await asyncio.sleep(3)
state = await c.get_game_state()
print(f'Day: {state.colony.day}, Pop: {state.colony.population}')
asyncio.run(test())
"
Priority
Do rle_first_winter_v1 and rle_raid_defense_v1 first — these are the scenarios most likely to show agent value (agents managing food/shelter before winter, agents drafting defenders during raids). The others can wait.
@CalebisGross — if you have RimWorld installed you can create some of these too. Just name the saves exactly as listed.
Context
The paired benchmark (agent vs baseline) requires reproducible initial conditions via named save files. Each scenario needs a RimWorld save at a specific colony state.
rle_crashlanded_v1exists — 5 more needed.Save Files to Create
All saves start from a Crashlanded scenario, Cassandra Classic, Adventure Story difficulty. Use dev mode to advance time and trigger events.
1.
rle_crashlanded_v1— DONE2.
rle_first_winter_v1Development > Date > Set dayor just fast-forward3.
rle_toxic_fallout_v1Debug actions > Incidents > Execute incident > ToxicFallout4.
rle_raid_defense_v1Debug actions > Incidents > Execute incident > RaidEnemy5.
rle_plague_response_v1Debug actions > Incidents > Execute incident > Plague6.
rle_ship_launch_v1Debug actions > Research > Finish projectDebug actions > Spawn pawn > Colonist)How to Create Each Save
rle_crashlanded_v1(base save)rle_first_winter_v1)curl http://localhost:8765/api/v1/game/stateshows expected tick/colonist countHow to Verify Saves Work with Benchmark
Priority
Do
rle_first_winter_v1andrle_raid_defense_v1first — these are the scenarios most likely to show agent value (agents managing food/shelter before winter, agents drafting defenders during raids). The others can wait.@CalebisGross — if you have RimWorld installed you can create some of these too. Just name the saves exactly as listed.