GridWorld is a small grid-environment project for experimenting with pathfinding and partially observable policy learning. The repository is now intentionally focused on two search paths only:
- deterministic A* search
- the file-backed PyTorch PPO example
The core gridworld package still provides the environment, rendering, transition model construction, and the gym-like reset / step interface used by those two paths.
This repo targets Python 3.13+ on CPython and uses uv.
git clone https://github.com/StengerJ/GridFinder
cd GridFinder
uv python install 3.13
uv venv .venv --python 3.13Activate the virtual environment:
# Windows PowerShell
.venv\Scripts\Activate.ps1
# macOS / Linux
source .venv/bin/activateInstall dependencies:
uv pip install -r Requirements.txt
uv pip install -e .from gridworld import GridWorld
world = """
wwwww
wa gw
w o w
wwwww
"""
env = GridWorld(world, slip=0.0, random_state=7)
state = env.reset()
next_state, reward, done, info = env.step(0, testing=True)
print(state, next_state, reward, done, info)
print(env.P_sas.shape, env.R_sa.shape)The deterministic A* example lives in examples/search/Astar/ and reads committed maps from examples/maps/search/.
25small maps are stored inexamples/maps/search/small/25large maps are stored inexamples/maps/search/large/
Example commands:
python examples/search/Astar/main.py --world small
python examples/search/Astar/main.py --world big
python examples/search/Astar/main.py --world small --variant 7
python examples/search/Astar/main.py --world big --variant 12 --no-renderThe PPO example lives in examples/Policy-Optimization/ and trains on committed map corpora stored under examples/maps/policy_optimization/.
train/stage1,train/stage2,train/stage3each contain64mapseval/stage1,eval/stage2,eval/stage3each contain16held-out maps
Useful commands:
python examples/Policy-Optimization/generate_map_corpus.py --force
python examples/Policy-Optimization/train_ppo.py
python examples/Policy-Optimization/evaluate_policy.py --checkpoint logs/policy_optimization/checkpoints/best.pt --stage 3
python examples/Policy-Optimization/render_episode.py --checkpoint logs/policy_optimization/checkpoints/best.pt --stage 3More detail is in examples/Policy-Optimization/README.md.
Run the current test suite with:
python -m unittest discover -s tests