Skip to content

noushad999/Bangla-Math-Merger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧪 Polymath-1.5B: Cross-Lingual Model Merging Experiment

License Hugging Face Python Colab

Research Question: Can we transfer mathematical reasoning capabilities to a low-resource language (Bengali) using Model Merging on small-scale LLMs (1.5B), without any GPU fine-tuning?


📖 Project Overview

This project explores the potential and limits of Model Merging techniques. By fusing a Base model with an Instruct-tuned model using the SLERP (Spherical Linear Interpolation) algorithm, I attempted to create a hybrid model capable of solving arithmetic problems in Bengali.

The experiment was conducted entirely on a CPU-only environment (Google Colab), demonstrating cost-effective AI engineering.

🛠️ The Tech Stack

  • Algorithm: SLERP (Spherical Linear Interpolation)
  • Tool: mergekit
  • Base Models:
    • Source A: Qwen/Qwen2.5-1.5B (Base)
    • Source B: Qwen/Qwen2.5-1.5B-Instruct (Math Logic)
  • Hardware: CPU (No GPU used)

📊 Experimental Results & Analysis

I developed an automated benchmarking script to evaluate the model's logic in both English and Bengali.

Performance Graph (Figure 1: Comparative analysis of mathematical reasoning accuracy)

🔍 Key Findings

Metric Accuracy Observation
English Logic 60% The merging process successfully retained the reasoning capabilities of the Instruct model. The model solves multi-step arithmetic problems correctly.
Bengali Logic 0% ⚠️ The model failed to generate correct Bengali syntax while maintaining math logic.

🧠 The "Capacity Gap" Hypothesis

The experiment reveals a critical insight into LLM scaling:

  1. Logic Retention: The SLERP algorithm works perfectly for preserving weights related to logic (proven by English results).
  2. Capacity Collapse: A 1.5 Billion parameter model lacks the sufficient capacity to handle Cross-Lingual Reasoning (Translation + Arithmetic) simultaneously.
  3. Conclusion: To achieve success in Bengali Math tasks via merging, a minimum model size of 7B or 8B parameters is recommended to overcome this trade-off.

🚀 How to Use the Model

You can load and run this model directly from Hugging Face using the transformers library.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Noushad999/Polymath-1.5B-Bengali-Math"

# Load Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")

# English Test (Works Great!)
question = "Jason bought 20 lollipops. Then he bought 5 more. How many total?"
inputs = tokenizer([question], return_tensors="pt")

# Generate
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📂 Project Structure

├── config.yaml           # MergeKit configuration file (SLERP parameters)
├── merge_script.ipynb    # Complete Colab Notebook (Code)
├── performance_graph.png # Benchmark visualization
└── README.md             # Project documentation

👨‍💻 Author

Md Noushad Jahan Ramim AI Researcher and Developer


This project serves as an educational case study on the limitations and capabilities of low-resource LLM engineering.

About

HuggingFace URL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors