A prototype e-commerce analytics system that provides product insights including historical sales trends, demand forecasts, customer segmentation, and actionable business recommendations.
- Python 3.8+
- OpenAI API key (for LLM features, optional but recommended)
# Clone the repository
git clone git@github.com:stevezkw1998/shop-sight-prototype.git
cd shop-sight-prototype
# Install dependencies
pip install -r requirements.txt# Basic usage - will prompt for product search
python examples/shop_insight.pyFor LLM features, set your API key:
export OPENAI_API_KEY="your-api-key-here"-
Database(use case) Learning (Prerequisites)
- Database schema + sample data → database information & insights md file
- Reason: This is useful for llm to understand the use case better
-
End-to-End Core Flow (Must Have)
- Product search → Historical sales visualization + Foracasted demand + Likely customer segments
-
LLM Integration (Must Have)
- Natural language insights generation
- Enhanced forecasting with LLM data analytics and businese insights
- Makes insights accessible to non-technical users
-
Real Data Over Mocking (Should Have)
- The H&M dataset is rich enough to support real analysis
- Real data builds credibility and shows actual capability
-
Comprehensive Analytics (Nice to Have)
- Actionable Suggestions
- All computable from existing data, so why not include them?
-
Polished Frontend (Should Have)
- Speed: Terminal UI is fastest to build, focuses on functionality over polish
- Credibility: Real data demonstrates actual capability, not just mockups
- Simplicity: Statistical forecasting is fast, explainable, and sufficient for a prototype
- Extensibility: LLM integration shows how to enhance with context-aware intelligence
Data Access: S3 bucket s3://kumo-public-datasets/hm_with_images/ is publicly accessible (anonymous read)
Terminal UI: Terminal-based interface is acceptable for prototype demonstration
Forecasting: Simple statistical methods are sufficient for prototype; LLM-enhanced forecasting combines data analysis with learned business insights for superior results.
Time Horizon: 4-week forecast horizon is reasonable for demonstration
| Feature | Implementation | Data Source |
|---|---|---|
| Product Search | Real SQL queries via DuckDB | S3 Parquet files |
| Historical Sales | Real transaction aggregation | transactions table |
| Customer Segments | Real customer data joins | customers + transactions tables |
| Price Trends | Real price analysis | transactions.price field |
| Sales Channels | Real channel distribution | transactions.sales_channel_id |
| Customer Loyalty | Real repeat purchase analysis | transactions.customer_id |
| LLM Insights | Real API calls | OpenAI/LiteLLM |
| Feature | Implementation | Why Simplified |
|---|---|---|
| Forecasting | LLM-enhanced forecasting combines data analysis with learned business insights for superior results. | Fast, explainable, sufficient for demo. Production would use Predictive AI. |
| UI | Terminal-based charts (plotext) | Fast to build. Production would be web dashboard. |
| Customer Segmentation | Basic demographics (age, membership, preferences) | Real data, but production would use RFM analysis or clustering. |
Key Insight: The dataset is rich enough that we didn't need to mock anything. All insights are based on real data, just using simpler methods than production systems.
┌─────────────────────────────────────────────────────────────┐
│ User Query │
│ "Nike running shoes" │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Product Search (Real) │
│ Query articles table via DuckDB │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Historical Sales Analysis (Real) │
│ • Load transactions │
│ • Aggregate by week │
│ • Generate charts │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Enhanced Analytics (Real + Simple) │
│ ┌──────────────────────────────────────────┐ │
│ │ Forecast (Time Series / LLM) │ │
│ │ • Statistical: weighted avg + trend │ │
│ │ • Optional: LLM with product context │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Customer Segments (Real Data) │ │
│ │ • Join transactions + customers │ │
│ │ • Age, membership, preferences │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Additional Insights (Real Data) │ │
│ │ • Price trends, channels, loyalty │ │
│ └──────────────────────────────────────────┘ │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LLM Synthesis (Real) │
│ • Schema context + product attributes │
│ (from Database Learning: schema + sample data analysis) │
│ • Combine all insights into natural language │
│ • Actionable recommendations │
└─────────────────────────────────────────────────────────────┘
▲
│
┌────────────────────┴────────────────────────────────────────┐
│ Database Learning (Prerequisites) │
│ • Analyze schema + sample data │
│ • Generate: docs/database_schema.md │
│ • Provides: field meanings, relationships & insights │
└─────────────────────────────────────────────────────────────┘
- ✅ Product search by name
- ✅ Historical sales visualization (weekly aggregation)
- ✅ Terminal charts (units sold, revenue trends)
- ✅ Demand forecast (next 4 weeks)
- Statistical method (default): weighted average + trend
- LLM method (optional): context-aware with product attributes
- ✅ Customer segmentation
- Age distribution, membership status, preferences
- Active vs. inactive customers
- First-time vs. repeat buyers
- ✅ Additional insights
- Price trends over time
- Sales channel distribution (online vs. in-store)
- Customer loyalty metrics
- Product lifecycle stage
- ✅ Natural language insights generation
- Combines all analytics into readable summary
- Includes database schema context for better understanding
- Provides actionable business recommendations
- ✅ LLM-enhanced forecasting (optional)
- Considers product type, department, seasonality
- Uses product attributes for context-aware predictions
Gap: Currently terminal-based UI
Approach:
- Build React/Next.js frontend
- Use Plotly or D3.js for interactive charts
- Create REST API wrapper around existing Python logic
- Add real-time updates via WebSockets
Gap: Using simple statistical methods
Approach:
- Use Kumo AI's Predictive AI for production-grade forecasting
- Implement LLM-based AI judge to evaluate forecast accuracy
- Apply self-improved prompts to iteratively enhance prediction quality
- Add seasonal decomposition and confidence intervals
- Consider external factors (promotions, holidays)
Gap: Basic demographics only
Approach:
- Implement RFM (Recency, Frequency, Monetary) analysis
- Use clustering algorithms (K-means, DBSCAN)
- Build predictive scoring models
- Create customer personas
Gap: Single product analysis only
Approach:
- Extend search to support multiple products
- Create side-by-side comparison views
- Add relative performance metrics
- Enable "similar products" recommendations
Gap: Keyword-based search only
Approach:
- Use LLM to parse natural language queries
- Convert to structured SQL queries
- Support complex queries ("products popular with young customers")
- Add query suggestions and autocomplete
Gap: Prototype function exists (database/client.py::text_to_sql) but not integrated into main workflow
Approach: Integrate Text to SQL into product search and analytics pipeline. Text to SQL significantly enhances query flexibility and reduces repetitive SQL design work.
Gap: Static analysis based on historical data
Approach:
- Set up data pipeline (Kafka, Airflow)
- Implement incremental data loading
- Add caching layer (Redis)
- Create scheduled refresh jobs
shop-sight-prototype/
├── core/
│ └── llm.py # LLM service wrapper
├── database/
│ └── client.py # DuckDB + S3 client
├── examples/
│ ├── shop_insight.py # Main demo script (full features)
│ └── product_sales_trend.py # Simpler version
├── docs/
│ └── database_schema.md # Database documentation
├── prompts/
│ └── base.py # LLM prompt templates
└── requirements.txt # Dependencies
- Python: Fast development, rich data science ecosystem
- DuckDB: Direct S3 Parquet reading, no local storage needed (🚀blazing-fast🚀)
- LiteLLM: Unified LLM interface, easy to switch providers
- Terminal UI: Fastest to build, focuses on functionality
- Statistical Forecasting: Simple, explainable, sufficient for demo
- Try different products: Search for "dress", "jacket", "shoes" to see varied results
- Compare methods: Run with
--llm-forecastvs. default to see difference - Check logs: All sessions are logged to
history/shop_insight/by default - Use known IDs: If you know an
article_id, use--article-idto skip search
This is a prototype for a take-home exercise.