A reinforcement learning environment for training AI agents to navigate hospital corridors, collect patients, and deliver them to appropriate medical departments while managing medication delivery in a telemedicine platform context.
This environment simulates a realistic hospital layout where an AI agent must:
- Navigate through hospital corridors (cannot enter rooms directly)
- Collect patients from corridor locations with different urgency levels
- Obtain medications from pharmacy drug stations when patients need them
- Deliver patients to appropriate medical departments
- Maximize patient care efficiency while minimizing time and resources
- Realistic Hospital Layout: Multiple departments including Emergency, ICU, Surgery, Cardiology, Neurology, Pediatrics, Lab, Radiology, and Pharmacy
- Corridor-Only Navigation: Agent must stick to realistic pathways between rooms
- Multi-Objective Task: Balance between patient collection, drug delivery, and room assignment
- Dynamic Patient Spawning: Patients appear with varying urgency and medication needs
- Visual Feedback: Pygame-based rendering with detailed hospital visualization
This environment supports the development of AI systems for telemedicine platforms and digital AI labs by:
- Route Optimization: Training agents to find efficient paths in complex healthcare facilities
- Resource Management: Learning to prioritize tasks based on patient urgency and medication needs
- Decision Making: Balancing multiple objectives in time-critical healthcare scenarios
- Workflow Automation: Optimizing patient flow and medication delivery processes
hospital-navigation-rl/
โโโ hospital_env.py # Main environment implementation
โโโ dqn_agent.py # Deep Q-Network agent
โโโ ppo_agent.py # Proximal Policy Optimization agent
โโโ reinforce_agent.py # REINFORCE algorithm agent
โโโ train.py # Training script for all algorithms
โโโ play.py # Random agent demonstration
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ models/ # Saved model directory
โโโ results/ # Training results and plots
โโโ demos/ # Generated demonstration GIFs
- Clone the repository:
git clone <repository-url>
cd hospital-navigation-rl- Install dependencies:
pip install -r requirements.txtView the environment with a random agent:
python play.py --mode random --episodes 3View strategic random agent:
python play.py --mode strategic --episodes 3Show static hospital layout:
python play.py --mode layoutTrain all three algorithms and compare:
python train.py --algorithm all --episodes 1000Train specific algorithm:
python train.py --algorithm dqn --episodes 1000
python train.py --algorithm ppo --episodes 1000
python train.py --algorithm reinforce --episodes 1000Evaluate trained model:
python train.py --algorithm dqn --evaluate --renderCharacteristics:
- Type: Off-policy, value-based
- Action Selection: Epsilon-greedy with experience replay
- Network: Deep neural network approximating Q-values
- Training: Uses target network and experience replay buffer
Advantages:
- Sample efficient through experience replay
- Stable learning with target network
- Works well in discrete action spaces
Implementation Features:
- Experience replay buffer (10,000 transitions)
- Target network updated every 100 steps
- Epsilon decay from 1.0 to 0.01
- Gradient clipping for stability
Characteristics:
- Type: On-policy, actor-critic
- Action Selection: Stochastic policy with probability sampling
- Network: Shared network with actor and critic heads
- Training: Uses clipped surrogate objective
Advantages:
- More stable than vanilla policy gradient
- Good sample efficiency
- Handles continuous and discrete actions well
Implementation Features:
- Generalized Advantage Estimation (GAE)
- Clipped surrogate objective (ฮต = 0.2)
- Multiple epochs per update (4 epochs)
- Entropy regularization for exploration
Characteristics:
- Type: On-policy, policy gradient
- Action Selection: Direct policy optimization
- Network: Policy network outputting action probabilities
- Training: Monte Carlo policy gradient
Advantages:
- Simple and intuitive
- Direct policy optimization
- Works with stochastic policies
Implementation Features:
- Monte Carlo returns calculation
- Baseline subtraction (return normalization)
- Gradient clipping for stability
- Episode-based updates
The observation vector contains:
- Agent Position: Normalized x, y coordinates (2 values)
- Agent Status: Carrying patient flag, has drugs flag (2 values)
- Patient Information: Up to 6 patients with position, urgency, drug needs (24 values)
- Drug Station Status: Availability of 2 drug stations (2 values)
Total Observation Size: 30 dimensions
8 discrete actions representing movement directions:
- 0: Up, 1: Down, 2: Left, 3: Right
- 4: Up-Left, 5: Up-Right, 6: Down-Left, 7: Down-Right
Positive Rewards:
- +10: Collecting drugs from pharmacy
- +40: Delivering drugs to patient needing medication
- +15: Picking up patient
- +30-75: Delivering patient to correct department (based on urgency)
Negative Rewards:
- -0.1: Each step (efficiency incentive)
- -1: Staying in same position
- -30: Delivering patient to wrong department
Episodes end when:
- Maximum steps reached (1000 steps)
- All patients have been saved
- User closes the window (in demo mode)
The training system tracks:
- Episode Rewards: Total reward accumulated per episode
- Patients Saved: Number of patients successfully delivered
- Training Loss: Algorithm-specific loss functions
- Efficiency: Patients saved per 1000 steps
Modify hospital_env.py to adjust:
- Hospital layout and room positions
- Number and placement of drug stations
- Patient spawn rates and characteristics
- Reward values for different actions
- Maximum episode length
Each agent file contains configurable hyperparameters:
- Learning rates
- Network architectures
- Exploration parameters
- Training frequencies
DQN:
- Expected to show steady improvement through experience replay
- Good final performance but may take longer to converge
- Stable learning curve with occasional plateaus
PPO:
- Generally fastest and most stable convergence
- Best balance of exploration and exploitation
- Highest final performance expected
REINFORCE:
- More variable learning curve
- May require more episodes to converge
- Simpler but potentially less efficient
- Episodes 0-200: Random exploration, low performance
- Episodes 200-500: Learning basic navigation and patient collection
- Episodes 500-800: Optimizing drug delivery and room assignments
- Episodes 800-1000: Fine-tuning efficiency and policy refinement
- Pygame Installation Issues:
pip install pygame --upgrade- CUDA/GPU Issues:
# Force CPU usage
export CUDA_VISIBLE_DEVICES=""- Memory Issues with Large Replay Buffers:
Reduce buffer size in
dqn_agent.py:
self.memory = ReplayBuffer(5000) # Reduced from 10000- Slow Training:
- Reduce episode length:
max_steps=500 - Use fewer training episodes initially
- Disable rendering during training
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built using Gymnasium framework
- Pygame for visualization
- PyTorch for deep learning implementations
- Inspired by healthcare workflow optimization research