Skip to content

Bhavana998/Invoice-Expense-Classifier

Repository files navigation

πŸ’Έ Invoice Expense Classifier AI

AI-Powered Invoice & Expense Classification System

Automatically classify invoice descriptions into business expense categories using Machine Learning and NLP.

Python FastAPI Scikit-Learn Docker Status


πŸ“Œ Overview

Invoice Expense Classifier AI is a production-ready Machine Learning API that automatically predicts expense categories from invoice text descriptions.

The system uses:

  • Natural Language Processing (NLP)
  • TF-IDF Vectorization
  • Logistic Regression
  • FastAPI REST API

This project helps businesses automate financial workflows and reduce manual expense categorization efforts.


πŸš€ Features

βœ… Core Features

  • AI-powered invoice classification
  • FastAPI REST API
  • Confidence score prediction
  • TF-IDF text vectorization
  • Logistic Regression classifier
  • Real-time predictions
  • JSON API responses

⚑ Production Features

  • Docker support
  • Clean architecture
  • Modular codebase
  • Swagger API documentation
  • Training pipeline included
  • Unit testing support
  • Lightweight deployment
  • Scalable backend design

🧠 Supported Categories

The model predicts the following expense categories:

  • Logistics
  • Office Supplies
  • Cloud/Software
  • Utilities
  • Travel
  • Inventory


output screens

Input

Swagger

output

output

🧠 Machine Learning Workflow

flowchart TD

    A[Invoice Text]
    --> B[Text Preprocessing]

    B --> C[TF-IDF Vectorization]

    C --> D[Logistic Regression Model]

    D --> E[Expense Category Prediction]

    E --> F[Confidence Score]
Loading

πŸ“‚ Project Structure

Invoice-Expense-Classifier/
β”‚
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ model.py
β”‚   β”œβ”€β”€ preprocessing.py
β”‚   β”œβ”€β”€ schemas.py
β”‚   └── utils.py
β”‚
β”œβ”€β”€ data/
β”‚   └── train.csv
β”‚
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ model.pkl
β”‚   └── vectorizer.pkl
β”‚
β”œβ”€β”€ tests/
β”‚   └── test_api.py
β”‚
β”œβ”€β”€ train.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ README.md
└── .gitignore

βš™οΈ Installation & Setup

1️⃣ Clone Repository

git clone https://github.com/Bhavana998/Invoice-Expense-Classifier.git

cd Invoice-Expense-Classifier

2️⃣ Create Virtual Environment

Windows

python -m venv venv

venv\Scripts\activate

Linux / Mac

python3 -m venv venv

source venv/bin/activate

3️⃣ Install Dependencies

pip install -r requirements.txt

πŸ‹οΈ Train Machine Learning Model

Run the training pipeline:

python train.py

Generated model files:

models/model.pkl
models/vectorizer.pkl

πŸš€ Run FastAPI Server

uvicorn app.main:app --reload

Server URL:

http://127.0.0.1:8000

Swagger API Documentation:

http://127.0.0.1:8000/docs

πŸ“Š Sample Training Dataset

data/train.csv

text,category
AWS monthly hosting charges,Cloud/Software
Blue Dart courier delivery,Logistics
Printer paper purchase,Office Supplies
Electricity bill payment,Utilities
Flight booking for meeting,Travel
Warehouse inventory materials,Inventory

πŸ”Œ API Usage

POST /predict

Predict expense category from invoice text.


πŸ“₯ Request

πŸ“Œ Sample API Requests

Example 1 β€” Cloud/Software

Request

{
  "text": "Delta flight ticket JFK to LAX for business conference"
}

Response

{
  "category": "Travel",
  "confidence": 0.77
}

Example 2 β€” Logistics

Request

{
  "text": "DHL express courier for warehouse delivery"
}

Response

{
  "category": "Logistics",
  "confidence": 0.95
}

Example 3 β€” Office Supplies

Request

{
  "text": "Staples printer paper ream 500 sheets"
}

Response

{
  "category": "Office Supplies",
  "confidence": 0.94
}

Example 4 β€” Utilities

Request

{
  "text": "Electricity bill for March with demand charges"
}

Response

{
  "category": "Utilities",
  "confidence": 0.96
}

Example 5 β€” Travel

Request

{
  "text": "Delta flight ticket JFK to LAX for business conference"
}

Response

{
  "category": "Travel",
  "confidence": 0.93
}

πŸ§ͺ API Testing Examples

cURL Example

curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d "{\"text\":\"Blue Dart courier charges\"}"

Python Request Example

import requests

url = "http://127.0.0.1:8000/predict"

payload = {
    "text": "Electricity bill for warehouse"
}

response = requests.post(url, json=payload)

print(response.json())

🐳 Docker Deployment

Build Docker Image

docker build -t invoice-classifier .

Run Docker Container

docker run -p 8000:8000 invoice-classifier

πŸ§ͺ Running Tests

pytest

πŸ“ˆ Model Details

Component Technology
NLP Technique TF-IDF
ML Algorithm Logistic Regression
Backend Framework FastAPI
Serialization Pickle
Testing Framework Pytest

🌍 Deployment Options

The project can be deployed on:

  • Render
  • Railway
  • Docker
  • AWS EC2
  • Azure
  • Google Cloud Platform
  • Kubernetes

πŸ‘©β€πŸ’» Author

setty Bhavana

GitHub Profile

https://github.com/Bhavana998


⭐ Support

If you found this project useful:

  • ⭐ Star the repository
  • 🍴 Fork the project
  • πŸš€ Contribute improvements

πŸ“œ License

This project is licensed under the MIT License.


πŸš€ Built with FastAPI + Machine Learning + NLP

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors