E-Commerce Sales Analysis

Exploratory data analysis and interactive business dashboard for an e-commerce platform dataset.

Project Structure

data_analysis/
├── dashboard.py                # Streamlit interactive dashboard
├── EDA_Refactored.ipynb        # Static analysis notebook
├── business_metrics.py         # Metric calculation functions
├── data_loader.py              # Data loading, cleaning, and date filtering
├── requirements.txt            # Python dependencies
└── ecommerce_data/             # Data directory (CSV files)
    ├── orders_dataset.csv
    ├── order_items_dataset.csv
    ├── products_dataset.csv
    ├── customers_dataset.csv
    ├── order_reviews_dataset.csv
    └── order_payments_dataset.csv

Setup

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Running the Dashboard

streamlit run dashboard.py

The dashboard opens at http://localhost:8501 and includes:

Header: Title and a global date-range filter (defaults to full year 2023).
KPI Row: Total Revenue, Monthly Growth, Average Order Value, Total Orders — each with a trend indicator vs the prior period of equal length.
Revenue Trend: Solid line (current period) vs dashed line (prior period), Y-axis in abbreviated dollar format.
Top 10 Categories: Horizontal bar chart with blue gradient, values formatted as $300K / $2M.
Revenue by State: US choropleth map with blue gradient colour scale.
Satisfaction vs Delivery Time: Average review score per delivery-speed bucket.
Bottom Row: Average delivery time (with trend) and average review score (with star rating).

Changing the date range filter updates every chart and card simultaneously.

Running the Notebook

jupyter notebook EDA_Refactored.ipynb

Changing the Notebook Analysis Period

Open EDA_Refactored.ipynb and edit the first configuration cell:

ANALYSIS_YEAR  = 2023   # Year to analyse
ANALYSIS_MONTH = None   # None = full year; set 1-12 for a single month
COMPARISON_YEAR = 2022  # Year used in the year-over-year comparison chart

All downstream cells read from these variables. No other cell needs to be changed.

Module Reference

data_loader.py

Function	Description
`load_all(clean=True)`	Load all six CSVs; optionally apply standard cleaning
`filter_by_date_range(orders, start_date, end_date)`	Filter orders to an inclusive date range
`filter_by_period(orders, year, month=None)`	Filter orders to a year, and optionally a month
`clean_orders(orders)`	Deduplicate and drop null-timestamp rows
`clean_order_items(order_items)`	Deduplicate and remove negative-price rows

business_metrics.py

Function	Returns	Description
`revenue_summary(orders, payments)`	dict	Total revenue, order count, AOV
`monthly_revenue(orders, payments)`	DataFrame	Revenue and order count by month
`yoy_monthly_comparison(...)`	DataFrame	Side-by-side monthly revenue for two years
`payment_type_breakdown(orders, payments)`	DataFrame	Revenue and count by payment method
`order_status_distribution(orders)`	DataFrame	Count and % by order status
`top_categories_by_revenue(items, products, n)`	DataFrame	Top N categories by revenue
`avg_item_value_by_category(items, products, n)`	DataFrame	Average item price per category
`revenue_by_state(orders, customers, payments, n)`	DataFrame	Top N states by revenue
`order_volume_by_state(orders, customers, n)`	DataFrame	Top N states by order count
`delivery_performance(orders)`	DataFrame	Per-order delivery days and on-time flag
`delivery_summary(orders)`	dict	Avg days, median days, on-time rate
`review_score_distribution(reviews)`	DataFrame	Count and % for each score 1-5
`avg_review_score(reviews)`	float	Mean review score

Data

Place the six CSV files in ecommerce_data/ before running the notebook. The expected schema matches the Olist Brazilian E-Commerce dataset (available on Kaggle: olistbr/brazilian-ecommerce). CSV files are gitignored to avoid committing large data files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-Commerce Sales Analysis

Project Structure

Setup

Running the Dashboard

Running the Notebook

Changing the Notebook Analysis Period

Module Reference

data_loader.py

business_metrics.py

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ecommerce_data		ecommerce_data
.gitignore		.gitignore
EDA_Refactored.ipynb		EDA_Refactored.ipynb
README.md		README.md
business_metrics.py		business_metrics.py
dashboard.py		dashboard.py
data_loader.py		data_loader.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

E-Commerce Sales Analysis

Project Structure

Setup

Running the Dashboard

Running the Notebook

Changing the Notebook Analysis Period

Module Reference

data_loader.py

business_metrics.py

Data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages