Skip to content

pallukurhade/data_analysis_ai

Repository files navigation

E-Commerce Sales Analysis

Exploratory data analysis and interactive business dashboard for an e-commerce platform dataset.

Project Structure

data_analysis/
├── dashboard.py                # Streamlit interactive dashboard
├── EDA_Refactored.ipynb        # Static analysis notebook
├── business_metrics.py         # Metric calculation functions
├── data_loader.py              # Data loading, cleaning, and date filtering
├── requirements.txt            # Python dependencies
└── ecommerce_data/             # Data directory (CSV files)
    ├── orders_dataset.csv
    ├── order_items_dataset.csv
    ├── products_dataset.csv
    ├── customers_dataset.csv
    ├── order_reviews_dataset.csv
    └── order_payments_dataset.csv

Setup

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Running the Dashboard

streamlit run dashboard.py

The dashboard opens at http://localhost:8501 and includes:

  • Header: Title and a global date-range filter (defaults to full year 2023).
  • KPI Row: Total Revenue, Monthly Growth, Average Order Value, Total Orders — each with a trend indicator vs the prior period of equal length.
  • Revenue Trend: Solid line (current period) vs dashed line (prior period), Y-axis in abbreviated dollar format.
  • Top 10 Categories: Horizontal bar chart with blue gradient, values formatted as $300K / $2M.
  • Revenue by State: US choropleth map with blue gradient colour scale.
  • Satisfaction vs Delivery Time: Average review score per delivery-speed bucket.
  • Bottom Row: Average delivery time (with trend) and average review score (with star rating).

Changing the date range filter updates every chart and card simultaneously.

Running the Notebook

jupyter notebook EDA_Refactored.ipynb

Changing the Notebook Analysis Period

Open EDA_Refactored.ipynb and edit the first configuration cell:

ANALYSIS_YEAR  = 2023   # Year to analyse
ANALYSIS_MONTH = None   # None = full year; set 1-12 for a single month
COMPARISON_YEAR = 2022  # Year used in the year-over-year comparison chart

All downstream cells read from these variables. No other cell needs to be changed.

Module Reference

data_loader.py

Function Description
load_all(clean=True) Load all six CSVs; optionally apply standard cleaning
filter_by_date_range(orders, start_date, end_date) Filter orders to an inclusive date range
filter_by_period(orders, year, month=None) Filter orders to a year, and optionally a month
clean_orders(orders) Deduplicate and drop null-timestamp rows
clean_order_items(order_items) Deduplicate and remove negative-price rows

business_metrics.py

Function Returns Description
revenue_summary(orders, payments) dict Total revenue, order count, AOV
monthly_revenue(orders, payments) DataFrame Revenue and order count by month
yoy_monthly_comparison(...) DataFrame Side-by-side monthly revenue for two years
payment_type_breakdown(orders, payments) DataFrame Revenue and count by payment method
order_status_distribution(orders) DataFrame Count and % by order status
top_categories_by_revenue(items, products, n) DataFrame Top N categories by revenue
avg_item_value_by_category(items, products, n) DataFrame Average item price per category
revenue_by_state(orders, customers, payments, n) DataFrame Top N states by revenue
order_volume_by_state(orders, customers, n) DataFrame Top N states by order count
delivery_performance(orders) DataFrame Per-order delivery days and on-time flag
delivery_summary(orders) dict Avg days, median days, on-time rate
review_score_distribution(reviews) DataFrame Count and % for each score 1-5
avg_review_score(reviews) float Mean review score

Data

Place the six CSV files in ecommerce_data/ before running the notebook. The expected schema matches the Olist Brazilian E-Commerce dataset (available on Kaggle: olistbr/brazilian-ecommerce). CSV files are gitignored to avoid committing large data files.

About

AI-powered data analysis pipeline in Python. Automates exploration, summarization, and insight generation from structured datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors