LLM used in this application:
- OpenAI Text Embedding 3 Small
Before running the application, follow these steps:
-
For this repository, create a GitHub Codespace (Cloud) OR clone it locally and open it with your preferred code editor (e.g. Visual Studio Code, ...).
-
Install Python (If not already installed):
- Windows: Download the latest installer from python.org or use:
winget install Python.Python.3.12 - macOS: Use Homebrew:
brew install python - Linux (Ubuntu/Debian):
sudo apt update && sudo apt install python3 python3-venv python3-pip - Cloud Workspaces (Codespaces, etc.): Python is usually pre-installed. Run
python3 --versionto verify and skip this step.
- Windows: Download the latest installer from python.org or use:
-
Create and Activate a Virtual Environment:
Important
From this point on, make sure that your present working directory on your terminal is the root directory of the application: ./Semantic-Search-Engine.
- Create the environment:
- Windows:
python -m venv .venv - macOS/Linux:
python3 -m venv .venv
- Windows:
- Activate it:
- Windows:
.\.venv\Scripts\activate - macOS/Linux:
source .venv/bin/activate
- Windows:
-
Install Dependencies:
- Upgrade
pipand install required libraries:python -m pip install --upgrade pip python -m pip install requests openai python-dotenv
- Upgrade
-
Environment Configuration:
- Open the
.envfile located at the root directory. This file contains the required API key for the application, read and set it carefully.
- Open the
-
Directory Structure
src/: Contains the source code.indexer.py: Script to generate products data and embeddings.semantic_search.py: MAIN SCRIPT for searching products.utils.py: Shared utility functions for serialization, database loading, and similarity calculation.threshold.py: Calculates minimum similarity score of potential products according to user search.
data/: Local storage for indexed data.products.json: Raw product data from the API.vectors.tsv: Tab-separated embeddings for the products.metadata.tsv: Metadata used for embeddings visualization.
- Missing API Key: Ensure
OPENROUTER_API_KEYis correctly set in your.envfile. - Dependency Issues: If running in a new environment, ensure you have executed the commands in Step 3.
- Virtual Environment Not Activated: If you receive "module not found" errors, ensure your virtual environment is activated (Step 3).
- First 2 phases should be run only for engineering purposes, go down straight to the Job Advisor section if you only care about getting that job!