Automatically classify invoice descriptions into business expense categories using Machine Learning and NLP.
Invoice Expense Classifier AI is a production-ready Machine Learning API that automatically predicts expense categories from invoice text descriptions.
The system uses:
- Natural Language Processing (NLP)
- TF-IDF Vectorization
- Logistic Regression
- FastAPI REST API
This project helps businesses automate financial workflows and reduce manual expense categorization efforts.
- AI-powered invoice classification
- FastAPI REST API
- Confidence score prediction
- TF-IDF text vectorization
- Logistic Regression classifier
- Real-time predictions
- JSON API responses
- Docker support
- Clean architecture
- Modular codebase
- Swagger API documentation
- Training pipeline included
- Unit testing support
- Lightweight deployment
- Scalable backend design
The model predicts the following expense categories:
- Logistics
- Office Supplies
- Cloud/Software
- Utilities
- Travel
- Inventory
Live Demo link: https://invoice-expense-classifier.onrender.com/docs
flowchart TD
A[Invoice Text]
--> B[Text Preprocessing]
B --> C[TF-IDF Vectorization]
C --> D[Logistic Regression Model]
D --> E[Expense Category Prediction]
E --> F[Confidence Score]
Invoice-Expense-Classifier/
β
βββ app/
β βββ main.py
β βββ model.py
β βββ preprocessing.py
β βββ schemas.py
β βββ utils.py
β
βββ data/
β βββ train.csv
β
βββ models/
β βββ model.pkl
β βββ vectorizer.pkl
β
βββ tests/
β βββ test_api.py
β
βββ train.py
βββ requirements.txt
βββ Dockerfile
βββ README.md
βββ .gitignoregit clone https://github.com/Bhavana998/Invoice-Expense-Classifier.git
cd Invoice-Expense-Classifierpython -m venv venv
venv\Scripts\activatepython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtRun the training pipeline:
python train.pyGenerated model files:
models/model.pkl
models/vectorizer.pkluvicorn app.main:app --reloadServer URL:
http://127.0.0.1:8000Swagger API Documentation:
http://127.0.0.1:8000/docstext,category
AWS monthly hosting charges,Cloud/Software
Blue Dart courier delivery,Logistics
Printer paper purchase,Office Supplies
Electricity bill payment,Utilities
Flight booking for meeting,Travel
Warehouse inventory materials,InventoryPredict expense category from invoice text.
{
"text": "Delta flight ticket JFK to LAX for business conference"
}{
"category": "Travel",
"confidence": 0.77
}{
"text": "DHL express courier for warehouse delivery"
}{
"category": "Logistics",
"confidence": 0.95
}{
"text": "Staples printer paper ream 500 sheets"
}{
"category": "Office Supplies",
"confidence": 0.94
}{
"text": "Electricity bill for March with demand charges"
}{
"category": "Utilities",
"confidence": 0.96
}{
"text": "Delta flight ticket JFK to LAX for business conference"
}{
"category": "Travel",
"confidence": 0.93
}curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d "{\"text\":\"Blue Dart courier charges\"}"import requests
url = "http://127.0.0.1:8000/predict"
payload = {
"text": "Electricity bill for warehouse"
}
response = requests.post(url, json=payload)
print(response.json())docker build -t invoice-classifier .docker run -p 8000:8000 invoice-classifierpytest| Component | Technology |
|---|---|
| NLP Technique | TF-IDF |
| ML Algorithm | Logistic Regression |
| Backend Framework | FastAPI |
| Serialization | Pickle |
| Testing Framework | Pytest |
The project can be deployed on:
- Render
- Railway
- Docker
- AWS EC2
- Azure
- Google Cloud Platform
- Kubernetes
If you found this project useful:
- β Star the repository
- π΄ Fork the project
- π Contribute improvements
This project is licensed under the MIT License.

