Skip to content

ShubhamRaj03/NeoStats-Data-Engineering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeoStats Retail Analytics Dashboard

Project Overview

NeoStats Retail Analytics Dashboard is an end-to-end Data Engineering and Business Intelligence project developed to analyze retail sales data and generate actionable business insights.

The project follows a complete ETL (Extract, Transform, Load) workflow using Python and Pandas. Multiple retail datasets are extracted, cleaned, transformed, validated, and consolidated into analytical datasets. The processed data is then visualized through an interactive Power BI dashboard consisting of three business-focused pages:

  1. Sales Overview Dashboard
  2. Product Performance Dashboard
  3. Customer Insights Dashboard

The solution helps stakeholders monitor revenue trends, product performance, customer behavior, and payment success metrics for better business decision-making.


Project Objectives

  • Build a complete ETL pipeline using Python.
  • Integrate multiple retail datasets into a unified dataset.
  • Perform data cleaning and validation.
  • Generate analytical output files.
  • Create interactive Power BI dashboards.
  • Analyze sales, products, customers, and payment performance.
  • Deliver business insights through visual analytics.

Dataset Information

The project uses the following datasets:

  • retail_data1.xlsx
  • retail_data2.xlsx
  • product_details.xlsx

Key attributes include:

  • Transaction ID
  • Customer ID
  • Customer Name
  • Product ID
  • Product Name
  • Category
  • Quantity
  • Price
  • Revenue
  • City
  • Payment Method
  • Payment Status
  • Transaction Date

Technology Stack

Programming & Data Processing

  • Python
  • Pandas
  • NumPy
  • Jupyter Notebook

Visualization

  • Power BI

Development Tools

  • VS Code
  • GitHub

Project Structure

NeoStats_Data_Engineering_Project

NeoStats_Data_Engineering_Project
│
├── Code/
│   └── retail_pipeline.py
│
├── Data/
│   ├── retail_data1.xlsx
│   ├── retail_data2.xlsx
│   └── product_details.xlsx
│
├── Notebook/
│   └── retail_pipeline.ipynb
│
├── Output/
│   ├── cleaned_retail_data.csv
│   ├── monthly_revenue.csv
│   └── data_quality_report.csv
│
├── PowerBI/
│   └── NeoStats_Retail_Dashboard.pbix
│
├── Screenshots/
│   ├── retail1_record.png
│   ├── retail2_record.png
│   ├── product_detail.png
│   ├── Sales_Overview.png
│   ├── Product_Performance.png
│   └── Customer_Insights.png
│
└── README.md

ETL Pipeline

Extract

  • Loaded retail_data1.xlsx
  • Loaded retail_data2.xlsx
  • Loaded product_details.xlsx

Transform

  • Merged retail datasets
  • Joined product information
  • Removed duplicates
  • Standardized column names
  • Converted date fields
  • Created revenue metrics
  • Generated monthly revenue summary

Data Validation

  • Checked missing values
  • Verified duplicates
  • Generated quality report

Load

Generated analytical output files:

  • cleaned_retail_data.csv
  • monthly_revenue.csv
  • data_quality_report.csv

Output Files

cleaned_retail_data.csv

Final cleaned dataset used for dashboard development.

monthly_revenue.csv

Monthly revenue summary used for trend analysis.

data_quality_report.csv

Data quality statistics including missing values and duplicate counts.


Power BI Dashboard

Page 1: Sales Overview Dashboard

Key Metrics:

  • Total Revenue
  • Total Customers
  • Total Transactions
  • Average Order Value

Visualizations:

  • Revenue by Category
  • Revenue by City
  • Payment Status Distribution
  • Monthly Revenue Trend

Filters:

  • City
  • Category

Page 2: Product Performance Dashboard

Key Insights:

  • Top Products by Revenue
  • Revenue Contribution by Category
  • Quantity Sold by Product
  • Revenue Distribution by Product

Visualizations:

  • Bar Chart
  • Donut Chart
  • Treemap
  • Product Details Table

Filters:

  • Product
  • Category

Page 3: Customer Insights Dashboard

Key Metrics:

  • Total Customers
  • Revenue per Customer
  • Success Rate
  • Failure Rate

Visualizations:

  • Customers by City
  • Revenue by City
  • Payment Status Distribution
  • Customer Details Table

Filters:

  • City
  • Payment Status

Dashboard Screenshots

Sales Overview Dashboard

Sales Overview Dashboard

Product Performance Dashboard

Product Performance Dashboard

Customer Insights Dashboard

Customer Insights Dashboard

Key Business Insights

  • Electronics category generates the highest revenue contribution.
  • Laptop is the highest revenue-generating product.
  • Payment success rate exceeds 90%.
  • Chennai and Delhi contribute significantly to total revenue.
  • Monthly revenue remains relatively stable with seasonal fluctuations.
  • Customer distribution is concentrated across major metropolitan cities.

Future Improvements

  • Real-time dashboard integration.
  • Automated ETL scheduling using Apache Airflow.
  • Cloud deployment using AWS or Azure.
  • Advanced customer segmentation.
  • Sales forecasting using Machine Learning.
  • Automated report generation.

How to Run the Project

1. Clone the Repository

git clone https://github.com/ShubhamRaj03/NeoStats-Data-Engineering-Project.git

2. Navigate to the Project Directory

cd NeoStats-Data-Engineering-Project

3. Create and Activate Virtual Environment

Create a virtual environment:

python -m venv venv

Activate the virtual environment (Windows):

venv\Scripts\activate

4. Install Required Dependencies

pip install -r requirements.txt

5. Run the ETL Pipeline

python Code/retail_pipeline.py

6. Generated Output Files

After successful execution, the following files will be generated inside the Output folder:

Output/
├── cleaned_retail_data.csv
├── monthly_revenue.csv
└── data_quality_report.csv

7. Open the Power BI Dashboard

Open the following file using Microsoft Power BI Desktop:

PowerBI/NeoStats_Retail_Dashboard.pbix

Author

Shubham Raj

B.Tech in Computer Science and Engineering(Data Science)

NeoStats Retail Analytics Dashboard Project

About

End-to-End Retail Data Engineering and Business Intelligence Project using Python, Pandas, ETL Pipeline, and Power BI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors