Research Question: Can we transfer mathematical reasoning capabilities to a low-resource language (Bengali) using Model Merging on small-scale LLMs (1.5B), without any GPU fine-tuning?
This project explores the potential and limits of Model Merging techniques. By fusing a Base model with an Instruct-tuned model using the SLERP (Spherical Linear Interpolation) algorithm, I attempted to create a hybrid model capable of solving arithmetic problems in Bengali.
The experiment was conducted entirely on a CPU-only environment (Google Colab), demonstrating cost-effective AI engineering.
- Algorithm: SLERP (Spherical Linear Interpolation)
- Tool:
mergekit - Base Models:
- Source A:
Qwen/Qwen2.5-1.5B(Base) - Source B:
Qwen/Qwen2.5-1.5B-Instruct(Math Logic)
- Source A:
- Hardware: CPU (No GPU used)
I developed an automated benchmarking script to evaluate the model's logic in both English and Bengali.
(Figure 1: Comparative analysis of mathematical reasoning accuracy)
| Metric | Accuracy | Observation |
|---|---|---|
| English Logic | 60% ✅ | The merging process successfully retained the reasoning capabilities of the Instruct model. The model solves multi-step arithmetic problems correctly. |
| Bengali Logic | 0% |
The model failed to generate correct Bengali syntax while maintaining math logic. |
The experiment reveals a critical insight into LLM scaling:
- Logic Retention: The SLERP algorithm works perfectly for preserving weights related to logic (proven by English results).
- Capacity Collapse: A 1.5 Billion parameter model lacks the sufficient capacity to handle Cross-Lingual Reasoning (Translation + Arithmetic) simultaneously.
- Conclusion: To achieve success in Bengali Math tasks via merging, a minimum model size of 7B or 8B parameters is recommended to overcome this trade-off.
You can load and run this model directly from Hugging Face using the transformers library.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Noushad999/Polymath-1.5B-Bengali-Math"
# Load Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
# English Test (Works Great!)
question = "Jason bought 20 lollipops. Then he bought 5 more. How many total?"
inputs = tokenizer([question], return_tensors="pt")
# Generate
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))├── config.yaml # MergeKit configuration file (SLERP parameters)
├── merge_script.ipynb # Complete Colab Notebook (Code)
├── performance_graph.png # Benchmark visualization
└── README.md # Project documentationMd Noushad Jahan Ramim AI Researcher and Developer
This project serves as an educational case study on the limitations and capabilities of low-resource LLM engineering.