🏔️ Summit-Sim

An AI-powered wilderness rescue simulator with human-in-the-loop validation.

Summit-Sim generates curriculum-informed, interactive backcountry emergencies for Wilderness First Responder (WFR) training. Instructors review and approve AI-generated scenarios before students engage with them, ensuring medical accuracy and pedagogical value.

Built for the Weber State AI Hackathon 🐾

Acknowledgments: Thanks to Keenan Grady at the Ogden Avalanche Center for domain expertise in wilderness safety and WFR curriculum.

💡 The Problem

Wilderness First Responder (WFR) training relies heavily on static paper scenarios or expensive live-action roleplay. Students rarely get enough dynamic, unpredictable repetitions to truly test their decision-making under pressure. When emergencies happen in the backcountry, textbook memorization isn't enough—responders need dynamic critical thinking developed through varied practice.

🚀 The Solution

Summit-Sim provides infinite, medically accurate WFR scenarios through a dynamic AI game loop. Instead of multiple-choice questions, students use natural language to evaluate scenes, check vitals, and apply treatments. The system evaluates every action against a hidden medical "truth," evolving the scenario exactly as a real patient's condition would change.

Key Features

Infinite Scenario Generation: Create highly specific emergencies based on environment, group size, difficulty, and complexity. Each scenario includes a unique AI-generated atmospheric image.
Human-In-The-Loop (HITL) Validation: Instructors review scenarios before publication, with all feedback logged to MLflow for continuous improvement.
Natural Language Interaction: Students type free-text actions (no multiple choice). The AI evaluates against WFR protocols and reveals information progressively.
Cumulative PAS Scoring: Tracks student progress (0-100%) across 5 Patient Assessment System milestones: Scene Size-up, Primary Assessment, Secondary Assessment, Treatment, and Evacuation Plan.
Intelligent Debriefing: Post-simulation analysis with clinical reasoning assessment, key mistakes identification, and personalized recommendations.
Shareable Scenarios: Students join via unique URLs—multiple students can work on the same scenario in different sessions.
Solo Practice Mode: Students can also generate their own scenarios for independent practice. The scenario auto-generates and immediately starts the simulation without instructor review. Students never see hidden information (learning objectives, hidden truth, or hidden state)—they must discover medical details through proper assessment, just like in the instructor-led flow.

🏗️ Technical Architecture

Summit-Sim is built with a sophisticated AI stack focused on medical safety, strict output schemas, and comprehensive observability.

Core Tech Stack

Component	Technology	Purpose
Orchestration	LangGraph	Complex state flows, HITL interrupts via interrupt(), checkpointing for persistence
Agent Framework	PydanticAI	Strict medical safety via enforced Pydantic schemas for all LLM I/O
UI Framework	Chainlit	Async, reactive Python UI for conversational flows
Observability	MLflow	LLM span tracing, human feedback tracking, prompt versioning, GEPA optimization
State Storage	Redis (local) / DragonflyDB (prod)	LangGraph checkpoint persistence and scenario storage
Image Generation	OpenRouter (Gemini Flash Image)	Unique atmospheric scenario images (16:9, non-blocking)

System Flow

The application runs two interconnected LangGraph workflows:

The Authoring Graph: Generates scenarios from environmental parameters, creates atmospheric images, and presents them for instructor review via HITL interrupt before publication. When students generate their own scenarios, this graph auto-approves and bypasses the review interrupt, jumping directly to simulation.
The Simulation Graph: A continuous game loop where students type actions, the AI evaluates against hidden medical truth, and the patient state evolves dynamically. Completes at 80% PAS milestone completion or max turns. All students (whether joining via instructor link or generating solo) see only observable information—hidden medical data is never revealed without proper assessment.

Agent Architecture

Four specialized PydanticAI agents power the system:

Agent	Purpose	Output Schema
Generator	Creates wilderness rescue scenarios from configuration	ScenarioDraft with visible/hidden separation
Image Generator	Produces unique scenario images	Base64-encoded 16:9 image
Action Responder	Evaluates student actions, reveals information, updates PAS score	ActionResponse with narrative and feedback
Debrief	Generates post-simulation performance analysis	DebriefReport with teaching points

Validation and Optimization

MLflow Judges (Implemented): Four specialized judges evaluate simulation quality:

Structure Judge: Score range validation, narrative length, feedback tone
Scoring Judge: Milestone justification, monotonic progress, acknowledgment
Medical Judge: Treatment gate validation (was_correct accuracy)
Continuity Judge: Progressive revelation, score monotonicity

Note: Judges are implemented but currently disabled in production due to MLflow bug #20782. They are used for offline GEPA (Genetic Pareto) optimization in Jupyter notebooks.

GEPA Optimization: Uses the judge framework with reflection models to iteratively improve agent prompts based on expert feedback. See notebooks/action-responder-prompt.ipynb.

📚 Resources

WFR Curriculum references used to ensure medical accuracy:

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.chainlit		.chainlit
.vscode		.vscode
kubernetes		kubernetes
notebooks		notebooks
plans		plans
public		public
src/summit_sim		src/summit_sim
tests		tests
.cruft.json		.cruft.json
.dockerignore		.dockerignore
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
SCRIPT.md		SCRIPT.md
chainlit.md		chainlit.md
chainlit_en-US.md		chainlit_en-US.md
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
flyer.jpg		flyer.jpg
flyer.kra		flyer.kra
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏔️ Summit-Sim

💡 The Problem

🚀 The Solution

Key Features

🏗️ Technical Architecture

Core Tech Stack

System Flow

Agent Architecture

Validation and Optimization

📚 Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏔️ Summit-Sim

💡 The Problem

🚀 The Solution

Key Features

🏗️ Technical Architecture

Core Tech Stack

System Flow

Agent Architecture

Validation and Optimization

📚 Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages