Skip to content

vneelima44/Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Sentiment-Analysis

Sentiment Analysis using NLP | FinTech

This project focuses on performing sentiment analysis on customer reviews in the FinTech domain using Natural Language Processing (NLP). The goal is to classify reviews as positive or negative based on the sentiment conveyed. This project leverages several machine learning models and NLP techniques to preprocess text data and build a predictive sentiment analysis model.

Project Overview

Sentiment analysis helps businesses understand customer opinions and feedback, which can be crucial in improving products and services. In this project, I analyzed customer reviews to classify the sentiment (positive/negative) using multiple machine learning algorithms. The project includes data cleaning, feature extraction, model building, and evaluation.

Table of Contents

Technologies Used Data Preprocessing Model Building Model Evaluation Visualizations Results

Technologies Used

Python: Programming language Pandas: Data manipulation and analysis NumPy: Numerical operations Scikit-learn: Machine learning algorithms (Random Forest, XGBoost, Decision Tree) NLTK / SpaCy: Text preprocessing (Tokenization, Lemmatization, Stemming) Matplotlib / Seaborn: Data visualization WordCloud: Visualization of frequently used words CountVectorizer / TfidfVectorizer: Feature extraction from text data

Data Preprocessing

The raw text data was preprocessed in the following steps:

Text Cleaning: Removal of special characters, numbers, and punctuation. Tokenization: Splitting the text into individual words (tokens). Stopword Removal: Eliminating common words that don’t add value to sentiment analysis. Stemming: Reducing words to their root form (e.g., "running" to "run"). Feature Extraction: Used CountVectorizer for bag-of-words representation. Employed TF-IDF to capture important words based on frequency.

Model Building

The following machine learning models were used to perform sentiment classification:

Random Forest Classifier: An ensemble method combining multiple decision trees to improve accuracy. XGBoost Classifier: A gradient boosting algorithm known for speed and performance. Decision Tree Classifier: A simple, interpretable model that splits data based on feature values.

Model Evaluation

Model performance was evaluated using the following metrics:

Accuracy: Proportion of correctly classified samples. Confusion Matrix: To visualize the true positive, true negative, false positive, and false negative values.

Visualizations

Various visualizations were created to explore and understand the data:

WordCloud: Displayed the most frequent words in the dataset. Sentiment Distribution: Visualized the number of positive and negative reviews. Model Performance Metrics: Plotted Confusion Matrix and other evaluation metrics.

Results

Achieved competitive model accuracy using Random Forest and XGBoost classifiers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors