NeoStats Retail Analytics Dashboard is an end-to-end Data Engineering and Business Intelligence project developed to analyze retail sales data and generate actionable business insights.
The project follows a complete ETL (Extract, Transform, Load) workflow using Python and Pandas. Multiple retail datasets are extracted, cleaned, transformed, validated, and consolidated into analytical datasets. The processed data is then visualized through an interactive Power BI dashboard consisting of three business-focused pages:
- Sales Overview Dashboard
- Product Performance Dashboard
- Customer Insights Dashboard
The solution helps stakeholders monitor revenue trends, product performance, customer behavior, and payment success metrics for better business decision-making.
- Build a complete ETL pipeline using Python.
- Integrate multiple retail datasets into a unified dataset.
- Perform data cleaning and validation.
- Generate analytical output files.
- Create interactive Power BI dashboards.
- Analyze sales, products, customers, and payment performance.
- Deliver business insights through visual analytics.
The project uses the following datasets:
- retail_data1.xlsx
- retail_data2.xlsx
- product_details.xlsx
Key attributes include:
- Transaction ID
- Customer ID
- Customer Name
- Product ID
- Product Name
- Category
- Quantity
- Price
- Revenue
- City
- Payment Method
- Payment Status
- Transaction Date
- Python
- Pandas
- NumPy
- Jupyter Notebook
- Power BI
- VS Code
- GitHub
NeoStats_Data_Engineering_Project
│
├── Code/
│ └── retail_pipeline.py
│
├── Data/
│ ├── retail_data1.xlsx
│ ├── retail_data2.xlsx
│ └── product_details.xlsx
│
├── Notebook/
│ └── retail_pipeline.ipynb
│
├── Output/
│ ├── cleaned_retail_data.csv
│ ├── monthly_revenue.csv
│ └── data_quality_report.csv
│
├── PowerBI/
│ └── NeoStats_Retail_Dashboard.pbix
│
├── Screenshots/
│ ├── retail1_record.png
│ ├── retail2_record.png
│ ├── product_detail.png
│ ├── Sales_Overview.png
│ ├── Product_Performance.png
│ └── Customer_Insights.png
│
└── README.md
- Loaded retail_data1.xlsx
- Loaded retail_data2.xlsx
- Loaded product_details.xlsx
- Merged retail datasets
- Joined product information
- Removed duplicates
- Standardized column names
- Converted date fields
- Created revenue metrics
- Generated monthly revenue summary
- Checked missing values
- Verified duplicates
- Generated quality report
Generated analytical output files:
- cleaned_retail_data.csv
- monthly_revenue.csv
- data_quality_report.csv
Final cleaned dataset used for dashboard development.
Monthly revenue summary used for trend analysis.
Data quality statistics including missing values and duplicate counts.
Key Metrics:
- Total Revenue
- Total Customers
- Total Transactions
- Average Order Value
Visualizations:
- Revenue by Category
- Revenue by City
- Payment Status Distribution
- Monthly Revenue Trend
Filters:
- City
- Category
Key Insights:
- Top Products by Revenue
- Revenue Contribution by Category
- Quantity Sold by Product
- Revenue Distribution by Product
Visualizations:
- Bar Chart
- Donut Chart
- Treemap
- Product Details Table
Filters:
- Product
- Category
Key Metrics:
- Total Customers
- Revenue per Customer
- Success Rate
- Failure Rate
Visualizations:
- Customers by City
- Revenue by City
- Payment Status Distribution
- Customer Details Table
Filters:
- City
- Payment Status
- Electronics category generates the highest revenue contribution.
- Laptop is the highest revenue-generating product.
- Payment success rate exceeds 90%.
- Chennai and Delhi contribute significantly to total revenue.
- Monthly revenue remains relatively stable with seasonal fluctuations.
- Customer distribution is concentrated across major metropolitan cities.
- Real-time dashboard integration.
- Automated ETL scheduling using Apache Airflow.
- Cloud deployment using AWS or Azure.
- Advanced customer segmentation.
- Sales forecasting using Machine Learning.
- Automated report generation.
git clone https://github.com/ShubhamRaj03/NeoStats-Data-Engineering-Project.gitcd NeoStats-Data-Engineering-ProjectCreate a virtual environment:
python -m venv venvActivate the virtual environment (Windows):
venv\Scripts\activatepip install -r requirements.txtpython Code/retail_pipeline.pyAfter successful execution, the following files will be generated inside the Output folder:
Output/
├── cleaned_retail_data.csv
├── monthly_revenue.csv
└── data_quality_report.csv
Open the following file using Microsoft Power BI Desktop:
PowerBI/NeoStats_Retail_Dashboard.pbix
Shubham Raj
B.Tech in Computer Science and Engineering(Data Science)
NeoStats Retail Analytics Dashboard Project


