Skip to content

komalkhatod1105/Fake_Review_Detection_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#  Fake Review Detection using NLP & Deep Learning

---

##  1. Introduction

In today's digital world, online reviews play an important role in influencing customer decisions. However, many platforms contain **fake or spam reviews** that mislead users.  

This project aims to build a system that can automatically classify reviews as **Real (Genuine)** or **Fake (Spam)** using **Natural Language Processing (NLP)** and **Machine Learning / Deep Learning techniques**.

---

##  2. Objectives

- Detect fake and genuine reviews automatically  
- Apply NLP techniques to process text data  
- Convert text into numerical features using TF-IDF  
- Train ML/DL models for classification  
- Evaluate model performance using standard metrics  

---

##  3. Dataset

### 🔹 Source:
Amazon Reviews Dataset  

### 🔹 Features:
- Review Text  
- Rating (1–5 stars)  
- Reviewer ID  
- Product ID  
- Verified Purchase  

### 🔹 Labels:
- **Real Review (0)**  
- **Fake Review (1)**  

---

##  4. System Architecture / Pipeline

Dataset → Preprocessing → Feature Extraction → Model Training → Testing → Prediction


---

##  5. Preprocessing (Text Cleaning)

Raw reviews are cleaned before processing.

### Steps:
- Convert text to lowercase  
- Remove stopwords (is, the, and, etc.)  
- Tokenization (split into words)  
- Stemming / Lemmatization  
- Remove punctuation and special characters  

### Example:
Input:

"This product is AMAZING!!!"


Output:

product amazing


---

##  6. Feature Extraction (TF-IDF)

TF-IDF converts text into numerical vectors.

### Concept:
- TF (Term Frequency): frequency of word in document  
- IDF (Inverse Document Frequency): importance of word  

### Formula:
TF-IDF = TF × IDF  

### Benefit:
- Important words get higher weight  
- Common words get lower weight  

---

##  7. Model Training

### Machine Learning Models:
- Logistic Regression  
- Naive Bayes  
- Support Vector Machine (SVM)  

### Deep Learning Models:
- Recurrent Neural Network (RNN)  
- Long Short-Term Memory (LSTM)  
- BERT (Bidirectional Encoder Representations from Transformers)  

### Training Process:
- Input: TF-IDF vectors  
- Output: Real / Fake label  
- Data split: 80% training, 20% testing  

---

##  8. Model Evaluation

### Metrics Used:
- Accuracy  
- Precision  
- Recall  
- F1-score  

### Example:
| Actual | Predicted | Result |
|--------|----------|---------|
| Fake   | Fake     |  Correct|
| Real   | Fake     |   Wrong |

---

## 9. Prediction Phase

The trained model predicts whether a new review is real or fake.

### Example:

Input:

"Excellent product!!! Must buy!!!"


Output:

Fake (Probability: 0.87)


Input:

"I used this product for 2 weeks, battery is good but camera is average"


Output:

Real (Probability: 0.91)


---

##  10. Detection Logic

### Fake Reviews:
- Repetitive words  
- Too many exclamation marks  
- Generic statements  
- No real experience  

### Real Reviews:
- Detailed explanation  
- Balanced opinion (pros + cons)  
- Natural writing style  

---

##  11. Technologies Used

- Python  
- Scikit-learn  
- TensorFlow / Keras  
- NLTK / SpaCy  
- Pandas  
- NumPy  

---

##  12. Project Structure

{8CB86153-51A2-4306-A959-861FB53A84EC}

---

##  13. How to Run the Project

### Step 1: Clone Repository

git clone https://github.com/your-username/fake-review-detection.git


### Step 2: Install Dependencies

pip install -r requirements.txt


### Step 3: Run the Project

python main.py


---

##  14. Future Improvements

- Use BERT for higher accuracy  
- Add web interface (React / MERN stack)  
- Real-time review detection  
- Deploy on cloud (AWS / Heroku)  
- Add multilingual support  

---

##  15. Limitations

- Dataset quality affects accuracy  
- Fake reviews becoming more realistic  
- Model may misclassify borderline cases  

---

##  16. Conclusion

This project demonstrates how NLP and machine learning can be used to detect fake reviews effectively. It improves reliability in e-commerce platforms by filtering out spam reviews and helping users make better decisions.

---


Releases

No releases published

Packages

 
 
 

Contributors