Production-ready OpenEnv server that wraps the 7vik/AmongUs game for multi-agent deception (Crewmates vs Impostors). Stage 1 delivers: Dockerization, a unified Action/Observation API (Pydantic), and a step() implementation that drives the 7vik backend and persists game state across turns.
Game logic is the 7vik/AmongUs sandbox, vendored in among-agents/ (CC0-1.0). It provides the real Skeld map (rooms, corridors, vents), tasks, task/meeting phases, voting, and kill/vent rules. One OpenEnv-controlled player per session; reset(seed=...) uses 7vik’s config (e.g. FIVE_MEMBER_GAME). The bridge in server/sevenvik_bridge.py maps our GameAction to 7vik Action types (MoveTo, Kill, Vote, Speak, CompleteTask/CompleteFakeTask) and builds observations from the game state.
-
1.1 Dockerization & API
- Container:
server/Dockerfile— run the environment in isolation (build from repo root soamong-agentsis included). - Action space (Pydantic):
GameActioninmodels.py:type: Literal["move", "task", "kill", "talk", "vote"],target: Optional[str]. For movetargetis a room name (e.g."Medbay"); for kill / vote it is a player color or name (e.g."red"); for talk it is the message text. - Observation space (Pydantic):
GameObservation:location,visible_players,internal_monologue,chat_history,game_status, plus OpenEnvdone/reward.
- Container:
-
1.2 OpenEnv
step()implementation- Logic bridge:
server/amongus_environment.py—step(action)delegates toSevenVikBridge, which turns the LLM’s JSON into 7vik actions and updates the game (map, activity log, phases). - State persistence: Each WebSocket session has its own
AmongUsEnvironmentand thus its own 7vikAmongUsinstance; game state is tracked across turns.
- Logic bridge:
- uv — install with:
curl -LsSf https://astral.sh/uv/install.sh | sh(orpip install uv)
From the repo root:
uv sync
export PYTHONPATH="${PWD}"
uv run uvicorn server.app:app --host 0.0.0.0 --port 8000uv synccreates a.venv(if needed), installs the Python version from.python-version, and installs dependencies frompyproject.toml(and locks withuv.lock).uv runruns the command inside the project’s virtual environment.
Build from the repo root (so among-agents is in the build context):
docker build -f server/Dockerfile -t amongus-openenv .
docker run -p 8000:8000 amongus-openenvfrom client import AmongUsEnv
from models import GameAction
with AmongUsEnv(base_url="http://localhost:8000") as env:
result = env.reset(seed=42)
print(result.observation.location, result.observation.visible_players)
result = env.step(GameAction(type="move", target="Medbay"))
print(result.observation.location)reset() accepts optional seed, episode_id, and kwargs (e.g. current_player_index, game_config for the bridge).
├── models.py # GameAction, GameObservation (Pydantic)
├── client.py # AmongUsEnv (EnvClient)
├── among-agents/ # 7vik/AmongUs backend (vendored)
│ └── amongagents/ # envs (map, game, action, player, task), agent, configs
├── server/
│ ├── app.py # FastAPI create_app(AmongUsEnvironment, ...)
│ ├── amongus_environment.py # reset/step/state, delegates to 7vik bridge
│ ├── sevenvik_bridge.py # GameAction → 7vik Action, observation, get_episode_outcome()
│ ├── game_sandbox.py # Minimal sandbox (unused when 7vik backend is active)
│ ├── Dockerfile
│ └── requirements.txt
├── training/ # Stage 2: GRPO fine-tuning with real RL env
│ ├── README.md # Full pipeline, Docker, Northflank, reward details
│ ├── grpo_train.py # TRL GRPOTrainer entrypoint (env reward by default)
│ ├── env_reward.py # Real env reward: replay game → step action → verifier
│ ├── verifier.py # Rule-based reward from episode outcome
│ ├── reward_proxy.py # Proxy fallback (format + reasoning)
│ ├── collect_episodes.py # Run random games, emit JSONL snapshots
│ ├── data/ # JSONL (generated by collect_episodes.py)
│ ├── Dockerfile # Northflank image (includes full Among Us env)
│ └── requirements.txt
├── openenv.yaml
├── pyproject.toml
├── uv.lock # lockfile (commit this); use `uv sync` to install
├── .python-version # Python version for uv
└── project-plan.md