# Fake Review Detection using NLP & Deep Learning
---
## 1. Introduction
In today's digital world, online reviews play an important role in influencing customer decisions. However, many platforms contain **fake or spam reviews** that mislead users.
This project aims to build a system that can automatically classify reviews as **Real (Genuine)** or **Fake (Spam)** using **Natural Language Processing (NLP)** and **Machine Learning / Deep Learning techniques**.
---
## 2. Objectives
- Detect fake and genuine reviews automatically
- Apply NLP techniques to process text data
- Convert text into numerical features using TF-IDF
- Train ML/DL models for classification
- Evaluate model performance using standard metrics
---
## 3. Dataset
### 🔹 Source:
Amazon Reviews Dataset
### 🔹 Features:
- Review Text
- Rating (1–5 stars)
- Reviewer ID
- Product ID
- Verified Purchase
### 🔹 Labels:
- **Real Review (0)**
- **Fake Review (1)**
---
## 4. System Architecture / Pipeline
Dataset → Preprocessing → Feature Extraction → Model Training → Testing → Prediction
---
## 5. Preprocessing (Text Cleaning)
Raw reviews are cleaned before processing.
### Steps:
- Convert text to lowercase
- Remove stopwords (is, the, and, etc.)
- Tokenization (split into words)
- Stemming / Lemmatization
- Remove punctuation and special characters
### Example:
Input:
"This product is AMAZING!!!"
Output:
product amazing
---
## 6. Feature Extraction (TF-IDF)
TF-IDF converts text into numerical vectors.
### Concept:
- TF (Term Frequency): frequency of word in document
- IDF (Inverse Document Frequency): importance of word
### Formula:
TF-IDF = TF × IDF
### Benefit:
- Important words get higher weight
- Common words get lower weight
---
## 7. Model Training
### Machine Learning Models:
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
### Deep Learning Models:
- Recurrent Neural Network (RNN)
- Long Short-Term Memory (LSTM)
- BERT (Bidirectional Encoder Representations from Transformers)
### Training Process:
- Input: TF-IDF vectors
- Output: Real / Fake label
- Data split: 80% training, 20% testing
---
## 8. Model Evaluation
### Metrics Used:
- Accuracy
- Precision
- Recall
- F1-score
### Example:
| Actual | Predicted | Result |
|--------|----------|---------|
| Fake | Fake | Correct|
| Real | Fake | Wrong |
---
## 9. Prediction Phase
The trained model predicts whether a new review is real or fake.
### Example:
Input:
"Excellent product!!! Must buy!!!"
Output:
Fake (Probability: 0.87)
Input:
"I used this product for 2 weeks, battery is good but camera is average"
Output:
Real (Probability: 0.91)
---
## 10. Detection Logic
### Fake Reviews:
- Repetitive words
- Too many exclamation marks
- Generic statements
- No real experience
### Real Reviews:
- Detailed explanation
- Balanced opinion (pros + cons)
- Natural writing style
---
## 11. Technologies Used
- Python
- Scikit-learn
- TensorFlow / Keras
- NLTK / SpaCy
- Pandas
- NumPy
---
## 12. Project Structure
---
## 13. How to Run the Project
### Step 1: Clone Repository
git clone https://github.com/your-username/fake-review-detection.git
### Step 2: Install Dependencies
pip install -r requirements.txt
### Step 3: Run the Project
python main.py
---
## 14. Future Improvements
- Use BERT for higher accuracy
- Add web interface (React / MERN stack)
- Real-time review detection
- Deploy on cloud (AWS / Heroku)
- Add multilingual support
---
## 15. Limitations
- Dataset quality affects accuracy
- Fake reviews becoming more realistic
- Model may misclassify borderline cases
---
## 16. Conclusion
This project demonstrates how NLP and machine learning can be used to detect fake reviews effectively. It improves reliability in e-commerce platforms by filtering out spam reviews and helping users make better decisions.
---