Skip to content

Policy-constrained LoRA fine-tuning to reduce hallucinations in a billing-focused LLM, using a PayFlow (fictional SaaS) use case with before–after evaluation.

Notifications You must be signed in to change notification settings

Swathi-88/Pay_Flow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

PayFlow Hallucination-Controlled Billing Assistant

Table of Contents

  1. Project Overview
  2. Problem Statement
  3. Why Hallucinations Are Dangerous in Billing Systems
  4. Project Goal
  5. Model & Training Strategy
  6. Dataset Design
  7. Policy-Driven Training Approach
  8. Training Process (Step-by-Step)
  9. Evaluation Methodology
  10. Results Summary
  11. Challenges Faced
  12. Key Learnings
  13. Limitations & Future Improvements
  14. Training Configuration Notes
  15. Conclusion

1. Project Overview

This project focuses on reducing hallucinations in a Large Language Model (LLM) used for SaaS billing customer support.
The assistant is designed for a fictional company called PayFlow and must answer only using official billing policies.

The model is fine-tuned using LoRA (Low-Rank Adaptation) on top of a base instruction-tuned LLM.


2. Problem Statement

Large Language Models often produce confident but incorrect answers, known as hallucinations.
In billing and payments systems, hallucinations can cause:

  • Financial loss
  • Legal issues
  • Loss of customer trust

This project addresses the question:

How can we constrain an LLM to answer strictly from approved billing policies and safely refuse when information is unavailable?


3. Why Hallucinations Are Dangerous in Billing Systems

Examples of real-world risks:

  • Inventing refund policies
  • Claiming unsupported payment methods
  • Fabricating discounts or currencies

In enterprise systems, safe refusal is better than a wrong answer.


4. Project Goal

The goal of this project is to:

  • Reduce hallucinations in billing-related questions
  • Ensure strict policy adherence
  • Teach the model when to refuse instead of guessing
  • Quantify hallucination reduction using before/after evaluation

5. Model & Training Strategy

Base Model

  • Instruction-tuned large language model (Mistral-style architecture)

Fine-Tuning Method

  • LoRA (Low-Rank Adaptation)
  • Only a small percentage of parameters are trained
  • Base model weights remain unchanged

Why LoRA?

  • Memory efficient
  • Faster training
  • Prevents catastrophic forgetting
  • Industry-standard for alignment tasks

6. Dataset Design

The model is not trained directly on policy documents.

Instead:

  • billing.md acts as the human source of truth
  • Policies are manually converted into instruction–response pairs
  • Only information present in the policy is allowed

Dataset Rules

  • One concept → one behavior
  • Explicit refusals for unknown information
  • No assumptions or industry defaults

7. Policy-Driven Training Approach

The model is trained to follow this rule:

Answer only what is explicitly stated in PayFlow’s billing policy.
If information is missing, respond with a standard refusal.

Standard refusal phrase:

This information is not available in PayFlow’s billing policy.

This consistency is critical for hallucination control.


8. Training Process (Step-by-Step)

  1. Environment setup in Google Colab
  2. Load base model in 4-bit precision
  3. Attach LoRA adapters
  4. Load curated dataset (train.json)
  5. Fine-tune LoRA adapters for 2–3 epochs
  6. Save LoRA adapter weights only

No full model retraining is performed.


9. Evaluation Methodology

Evaluation is performed using:

  • Known policy questions
  • Edge cases
  • Trap questions (questions not covered in policy)

Each response is classified as:

  • ✅ Correct
  • ⚠️ Safe but imperfect (over-refusal / verbosity)
  • ❌ Hallucination (fabricated information)

10. Results Summary

  • Hallucination rate before training: ~60–70%
  • Hallucination rate after training: ~8–10%
  • Approximate reduction: 75–90%

Detailed before/after comparisons are documented separately in RESULTS.md.


11. Challenges Faced

1. Partial Alignment Overconfidence

The model initially hallucinated more confidently after partial fine-tuning.

2. Over-Refusal

Excessive refusal occurred when refusal samples outweighed valid answers.

3. Dataset Contradictions

Similar questions with different expected behaviors caused instability.

Each issue was resolved through dataset normalization and retraining.


12. Key Learnings

  • Hallucination reduction is a data problem, not a model problem
  • Models hallucinate where policies are silent
  • Over-refusal is safer than hallucination but must be balanced
  • Alignment is iterative, not one-shot

13. Limitations & Future Improvements

  • Some verbosity and response blending remains
  • Further reduction (<5%) would require larger datasets
  • Automated policy-to-dataset generation could improve scalability

14. Training Configuration Notes

LoRA adapters were successfully trained over 3 epochs, with training loss decreasing steadily, indicating effective policy alignment.

Training completed in 3 epochs with final training loss ≈ 1.14.

15. Conclusion

This project demonstrates a practical, enterprise-grade approach to hallucination control using LoRA.

Rather than chasing perfect answers, the model is trained to:

  • Respect policy boundaries
  • Avoid guessing
  • Fail safely

This approach mirrors how real-world AI systems are deployed in billing and finance domains.


About

Policy-constrained LoRA fine-tuning to reduce hallucinations in a billing-focused LLM, using a PayFlow (fictional SaaS) use case with before–after evaluation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published