Skip to content

Neginodar/SNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Network Analysis of Amazon's Product Co-Purchasing Ecosystem

Overview

This project analyzes Amazon's e-commerce ecosystem using network science techniques to understand consumer behavior, product relationships, and marketplace dynamics. The study leverages Amazon's product metadata from Stanford's SNAP project to construct and analyze multiple network types that reveal how products connect through customer interactions.

Dataset

  • Source: Amazon Product Co-Purchasing Network Metadata (Stanford SNAP Project) https://snap.stanford.edu/data/amazon-meta.html
  • Collection Period: Summer 2006
  • Size: 548,552 products with ~7.8 million customer reviews
  • Categories: Books (71.7%), Music CDs (18.8%), Videos (4.8%), DVDs (3.6%)
  • Data Types: Product metadata, co-purchase relationships, category hierarchies, customer reviews

Network Types Analyzed

1. Direct Co-Purchase Network

  • Structure: Product-to-product connections via Amazon's "similar" field
  • Purpose: Captures Amazon's algorithmic co-purchase relationships

2. Product-Customer Bipartite Network

  • Structure: Bipartite graph connecting customers to reviewed products
  • Purpose: Foundation for authentic customer-product interaction analysis

3. Co-Review Network (Product-Product Projection)

  • Structure: Products connected by shared reviewers
  • Purpose: Customer-driven product similarity independent of Amazon's algorithms

4. Customer Similarity Network (Customer-Customer Projection)

  • Structure: Customers connected by common product reviews
  • Purpose: Customer segmentation and behavioral analysis

Key Findings

Network Structure Comparison

  • Customer-driven networks show exceptional connectivity and cohesion (>92% in largest component)
  • Direct co-purchase network exhibits extreme fragmentation (5,390 components, 0.007% largest component)
  • Co-review networks demonstrate highest density (0.5550) and clustering (0.8892)

Small-World Properties

  • Co-review and customer similarity networks exhibit strong small-world characteristics
  • Average path lengths: Co-review (1.38), Customer similarity (2.25)
  • Superior navigation efficiency compared to algorithmic approaches

Community Structure

  • Customer-driven networks form 39-74 meaningful communities (sizes 203-264)
  • Direct co-purchase network shows extreme fragmentation (5,390 tiny communities)
  • Large cliques identified: Customer similarity (max 568), Co-review (max 337)

Methodology

Sampling Strategy

  • Stratified sampling: Percentile-based across product performance levels
  • Subgraph construction: 4,000-node strategic sampling using BFS from anchor nodes
  • Reproducibility: Fixed random seeds (42) for all procedures

Analysis Tools

  • Python Libraries: pandas, networkx, scikit-learn
  • Metrics: Centrality, clustering, community detection (Louvain), clique analysis, k-core decomposition
  • Validation: Multiple network types for cross-validation of structural patterns

Files

  • report.tex - Complete LaTeX research report
  • SNA_project.ipynb - Jupyter notebook with analysis code
  • SNA_project.pdf - Compiled PDF report
  • README.md - This file

Key Insights for E-commerce

  1. Recommendation Systems: Customer-driven projections provide superior structures for product recommendations compared to algorithmic approaches
  2. Marketing Strategy: Natural community detection reveals authentic customer segments and product clusters
  3. Navigation Efficiency: Small-world properties enable rapid information flow and product discovery
  4. Long-tail Support: Distributed connectivity patterns support diverse product visibility beyond bestsellers

Limitations & Future Work

  • Dataset from 2006 may not reflect current marketplace dynamics
  • Static analysis - temporal evolution not captured
  • Amazon-only data - external marketplace influences not considered
  • Structural focus - product attributes like pricing/quality not integrated

Dependencies

pandas
networkx
matplotlib
seaborn
scikit-learn
numpy

Usage

  1. Load the Jupyter notebook SNA_project.ipynb
  2. Ensure required Python libraries are installed
  3. Run cells sequentially to reproduce the analysis
  4. Modify sampling parameters or network types as needed

Citation

If you use this work, please cite:

Odarbashi, N., & Shahri, R. (2025). Network Analysis of Amazon's Product Co-Purchasing Ecosystem. 
Social Network Analysis Project.

License

This project is for academic purposes. Dataset provided by Stanford SNAP Project.

About

This project analyzes Amazon's e-commerce ecosystem using network science techniques to understand consumer behavior, product relationships, and marketplace dynamics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors