Multilingual Robustness Analysis of Sentiment Models in Indian Ride-Hailing Reviews & Robustness Study of IndicBERT for Code-Mixed Indian User Text

Overview

This project investigates the robustness of sentiment classification systems in the context of Indian ride-hailing reviews, with particular emphasis on multilingual and code-mixed text — a common characteristic of real-world user-generated content in India.

Two complementary modeling paradigms are studied:

Classical baseline: TF-IDF + Logistic Regression
Transformer model: IndicBERT fine-tuning

The primary objective is to examine how sentiment models trained largely on English data behave when exposed to linguistically diverse and noisy inputs.

Motivation

Most sentiment analysis systems are developed and evaluated on clean English datasets. However, Indian user reviews frequently exhibit:

Multilingual usage
Code-mixing (e.g., Hinglish)
Informal spelling and noise
Transliteration of Indic languages

These characteristics often lead to performance degradation in standard NLP pipelines.

This project aims to systematically probe these weaknesses in the context of ride-hailing platforms.

Datasets

1️. Primary Dataset

Indian Ride Hailing Driver Reviews
https://www.kaggle.com/datasets/abubakkar01/indian-ride-hailing-driver-reviews

Usage:

Main supervised training
Real-world noisy review distribution

Note: The training data is predominantly English. Multilingual evaluation is therefore used as a robustness probe rather than full multilingual training.

2️. Code-Mixed Evaluation Set (Included)

A small synthetic Hinglish-style dataset is provided in: code_mixed_test.csv

Purpose:

Stress-test model robustness
Evaluate behavior on mixed-language inputs
Identify failure patterns

Methodology

Baseline Model

Unicode-aware preprocessing
Word + character TF-IDF
Logistic Regression classifier
Error analysis pipeline

Character n-grams are included to improve robustness to spelling variation and transliteration.

Advanced Model

Pretrained IndicBERT
Fine-tuned for 3-class sentiment
Early stopping with validation monitoring
Multilingual stress evaluation
Confusion matrix analysis

This enables comparison between classical sparse methods and pretrained multilingual transformers.

Evaluation

The following metrics are reported:

Accuracy
Weighted F1-score
Confusion matrix
Qualitative error analysis
Quantitative code-mixed accuracy

The code-mixed evaluation provides a direct measure of cross-lingual robustness.

Results

TF-IDF Baseline

Accuracy & Classification Report (multilingual_sentiment):

Confusion Matrix (multilingual_sentiment):

Error Analysis & Multilingual Stress Test (multilingual_sentiment):

IndicBERT Model

Confusion Matrix (indicbert_sentiment):

Final Metrics & Multilingual Test (indicbert_sentiment):

Code-Mixed Quantitative Evaluation

Key Observations

Strong performance is observed on in-domain English reviews.
Performance degrades on code-mixed and transliterated inputs.
Character n-grams improve classical model robustness.
IndicBERT shows better multilingual handling but still struggles with noisy Hinglish text.
Quantitative evaluation confirms measurable performance drop on code-mixed inputs

These findings highlight the gap between benchmark NLP performance and real Indian user data.

How to Run

Step 1 - Install dependencies by running "pip install -r requirements.txt" in the command prompt

Step 2 - Prepare dataset by downloading the datasets from "https://www.kaggle.com/datasets/abubakkar01/indian-ride-hailing-driver-reviews" and by running "python prepare_dataset.py" in the command prompt or run prepare_dataset.py directly after downloading the dataset from kaggle

Step 3 - Run baseline model by running "python multilingual_sentiment.py" in the command prompt or run multilingual_sentiment.py directly

Step 4 - Run IndicBERT model by running "python indicbert_sentiment.py" in the command prompt or run indicbert_sentiment.py directly

Project Structure

Multilingual_Sentiment_Project/ │ ├── .vscode/ │ └── settings.json ├── Dataset/ │ ├── code_mixed_text.csv │ └── indian_ride_hailing_services_analysis.csv ├── Images ├── .gitignore ├── prepare_dataset.py ├── multilingual_sentiment.py ├── indicbert_sentiment.py ├── requirements.txt ├── README.md └── Report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Robustness Analysis of Sentiment Models in Indian Ride-Hailing Reviews & Robustness Study of IndicBERT for Code-Mixed Indian User Text

Overview

Motivation