Build a containerized app that uses machine learning. See instructions for details.
The Receipt Scanner & Expense Categorizer is a containerized application designed to automate expense tracking. It consists of three subsystems:
- Machine Learning Client: Captures images, performs OCR, and categorizes expenses using AI.
- Web App: A dashboard to view receipts, analytics, and spending history.
- Database: A MongoDB instance storing all data.
- Person 1 (DevOps): Majo Salgado
- Person 2 (Data/ML): Apoorv Belgundi
- Person 3 (ML Client): Galal Bichara
- Person 4 (Web Backend): Asim
- Person 5 (Web Frontend): Anshu Aramandla
- Python 3.12 (for local development)
- Docker Desktop installed and running (for Docker deployment)
- OpenAI API Key (for OCR processing - required for receipt processing)
Step 1: Start MongoDB
Run MongoDB in Docker (easiest for local testing):
PowerShell (Windows):
docker run -d -p 27017:27017 --name mongodb-local -e MONGO_INITDB_ROOT_USERNAME=admin -e MONGO_INITDB_ROOT_PASSWORD=password123 mongo:latestBash/Linux/Mac:
docker run -d -p 27017:27017 --name mongodb-local \
-e MONGO_INITDB_ROOT_USERNAME=admin \
-e MONGO_INITDB_ROOT_PASSWORD=password123 \
mongo:latestStep 2: Create Environment File
Create a .env file in the project root (same directory as docker-compose.yml):
PowerShell (Windows):
@"
MONGO_USER=admin
MONGO_PASS=password123
MONGO_DB_NAME=receipts_db
MONGO_HOST=localhost
SECRET_KEY=dev-secret-key-change-in-production
OPENAI_API_KEY=your-openai-api-key-here
"@ | Out-File -FilePath .env -Encoding utf8Bash/Linux/Mac:
cat > .env << EOF
MONGO_USER=admin
MONGO_PASS=password123
MONGO_DB_NAME=receipts_db
MONGO_HOST=localhost
SECRET_KEY=dev-secret-key-change-in-production
OPENAI_API_KEY=your-openai-api-key-here
EOFStep 3: Install Dependencies and Run Web App
# Install web app dependencies
cd web-app
pip install -r requirements.txt
# Run the Flask application
python app.pyThe web app will start at: http://localhost:5000
Step 4: Run Worker Process (Required for Processing Receipts)
Open a new terminal and run:
# Install ML client dependencies
cd machine-learning-client
pip install -r requirements.txt
# Run the worker (keeps running, processes receipts)
python worker.pyImportant: Keep both terminals running:
- Terminal 1: Web app (Flask server)
- Terminal 2: Worker (processes receipts)
Step 1: Create Environment File
Create a .env file in the project root:
PowerShell (Windows):
@"
MONGO_USER=admin
MONGO_PASS=password123
MONGO_DB_NAME=receipts_db
MONGO_HOST=mongodb
SECRET_KEY=dev-secret-key-change-in-production
OPENAI_API_KEY=your-openai-api-key-here
"@ | Out-File -FilePath .env -Encoding utf8Bash/Linux/Mac:
cat > .env << EOF
MONGO_USER=admin
MONGO_PASS=password123
MONGO_DB_NAME=receipts_db
MONGO_HOST=mongodb
SECRET_KEY=dev-secret-key-change-in-production
OPENAI_API_KEY=your-openai-api-key-here
EOFNote: For Docker, MONGO_HOST must be mongodb (Docker service name), not localhost.
Step 2: Run with Docker Compose
# From project root
docker-compose up --buildThis starts all services:
- MongoDB container
- Web app container (http://localhost:5000)
- ML client worker container (processes receipts automatically)
Step 3: Access the Application
- Web Dashboard: http://localhost:5000
- MongoDB: localhost:27017
Important Notes:
- First receipt processing: The first receipt you upload will take 5-15 minutes to process. This is because the ML models (EasyOCR ~500MB, BART ~1.6GB) need to be downloaded on first use. Subsequent receipts process much faster (10-30 seconds).
- Authentication: You must sign up and log in before uploading receipts.
- Worker process: The ML client worker runs automatically in Docker and processes receipts in the background.
| Aspect | Local Development | Docker |
|---|---|---|
| MONGO_HOST | localhost |
mongodb |
| Processes | Run manually in 2 terminals | All run automatically |
| File Storage | {project_root}/uploads/ |
Shared Docker volume /app/uploads (mounted in both containers) |
| Worker | Must run python worker.py manually |
Runs automatically in container |
| Code Changes | Restart Python process | Rebuild container: docker-compose build <service> |
- MongoDB is running (Docker container or local install)
-
.envfile created in project root with correct values -
MONGO_HOST=localhostfor local,MONGO_HOST=mongodbfor Docker -
OPENAI_API_KEYset in.env(required for OCR) - Web app dependencies installed (
pip install -r requirements.txtinweb-app/) - ML client dependencies installed (
pip install -r requirements.txtinmachine-learning-client/) - Web app running (local) or Docker containers started
- Worker process running (local) or Docker containers started
Local (Fastest for Development):
# Terminal 1: Web app
cd web-app && python app.py
# Terminal 2: Worker
cd machine-learning-client && python worker.pyDocker (Production-like):
docker-compose up --buildAfter making code changes:
If you modify Python files (e.g., ocr_processor.py, expense_classifier.py, worker.py), rebuild the container:
# Rebuild and restart ml-client
docker-compose build ml-client
docker-compose restart ml-client
# Or rebuild and restart in one command
docker-compose up -d --build ml-clientViewing logs:
# Watch worker logs in real-time
docker logs ml-client -f
# Watch web app logs
docker logs web-app -f
# View all logs
docker-compose logs -fManaging containers:
# Stop all containers
docker-compose down
# Restart all containers (without rebuilding)
docker-compose restart
# Start in background
docker-compose up -d
# Check container status
docker psPort 27017 already in use:
- Stop any existing MongoDB containers:
docker ps -a | grep mongothendocker stop <container-name> - Or remove old containers:
docker rm <container-name>
First receipt taking a long time:
- This is normal! Models are downloading (EasyOCR ~500MB, BART ~1.6GB)
- Check logs:
docker logs ml-client -fto see progress - Subsequent receipts will be much faster
Receipt stuck in "processing" status:
- Check worker logs:
docker logs ml-client --tail 50 - Verify
OPENAI_API_KEYis set in.env - Ensure ml-client container is running:
docker ps
File not found errors:
- Verify shared volume exists:
docker volume ls | grep receipt-uploads - Check files in volume:
docker exec web-app ls -la /app/uploads
For detailed instructions and troubleshooting, see TESTING_GUIDE.md.
- Containerization: Docker, Docker Compose
- Database: MongoDB
- Backend: Python 3.12, Flask, Flask-Login
- Machine Learning: PyTorch, Transformers (Hugging Face), EasyOCR, OpenAI API
- Computer Vision: OpenCV
- CI/CD: GitHub Actions
- Linting & Testing: Pylint, Black, Pytest