Skip to content

Latest commit

 

History

History
272 lines (189 loc) · 8.25 KB

File metadata and controls

272 lines (189 loc) · 8.25 KB

Module 0: Setup & Environment

🎯 Learning Objectives

By the end of this module, you will:

  • ✅ Have a working Codespaces environment
  • ✅ Understand the project structure and architecture
  • ✅ Configure your OpenAI API key
  • ✅ Verify DocumentDB connection
  • ✅ Understand the dataset you'll be working with

📋 Prerequisites Checklist

Before starting, ensure you have:

  • GitHub account
  • OpenAI API key (Get one here)
  • Codespace created from this repository

Step 1: Configure Workshop Environment

This workshop is designed to run entirely in GitHub Codespaces, providing a consistent, pre-configured development environment for all participants.

💡 Why Codespaces? No local setup required, consistent environment for everyone, and automatic dependency installation.

Launch Your Codespace

  1. Navigate to the repository:

    • Go to: https://github.com/documentdb/booking-agents-sample
  2. Open in GitHub Codespaces:

    • Click the green "Code" button
    • Select the "Codespaces" tab
    • Click "Create codespace on workshop"

    Alternatively, click this badge:

    Open in GitHub Codespaces

  3. Wait for the environment to build (first launch takes 2-3 minutes):

    • Python 3.11 environment
    • Node.js 20
    • Docker-in-Docker
    • VS Code extensions (DocumentDB, Python, Docker)
    • All dependencies automatically installed
  4. Verify Codespace is ready:

    • You should see VS Code in your browser
    • Extensions should be installed (check the sidebar)
    • Terminal should be available at the bottom
  5. Open a terminal (Terminal → New Terminal) and proceed to Activity 2


Step 2: Set Up DocumentDB Container

Now that your environment is ready, let's deploy DocumentDB locally using Docker.

Deploy DocumentDB Container

  1. Pull the DocumentDB Docker image:

    docker pull ghcr.io/documentdb/documentdb/documentdb-local:latest
  2. Tag the image for convenience:

    docker tag ghcr.io/documentdb/documentdb/documentdb-local:latest documentdb
  3. Run the DocumentDB container:

    docker run -dt -p 10260:10260 --name documentdb-container documentdb --username admin --password password123

Change Port Visibilities

The forwarded ports in Codespaces default to Private, which can block connections between services. You need to make them Public so the frontend, backend, and DocumentDB can communicate.

  1. Open the Ports panel:

    • In the terminal area at the bottom of VS Code, click the "Ports" tab (next to Terminal, Output, etc.)
  2. Update port visibility:

    • You should see port 10260 (DocumentDB) listed
    • Right-click on the port row
    • Select "Port Visibility""Public"
    • Repeat for ports 3000 (frontend) and 8000 (backend) if they are listed

💡 Why Public? In Codespaces, private ports require authentication tokens that automated service-to-service connections don't provide. Setting ports to Public allows the services to reach each other.

  1. Verify the container is running:

    docker ps

    You should see documentdb-container running on port 10260.

    Expected output:

    CONTAINER ID   IMAGE        COMMAND                  CREATED         STATUS         PORTS                      NAMES
    abc123def456   documentdb   "./entrypoint.sh --u…"   10 seconds ago  Up 9 seconds   0.0.0.0:10260->10260/tcp   documentdb-container
    

Connect to DocumentDB with VS Code Extension

Download the 'DocumentDB for VS Code' extension on your codespace using the VS Code Marketplace. Afterwards, follow these steps to connect your DocumentDB container to the extension:

  1. Open the DocumentDB extension:

    • Click the DocumentDB icon in the left sidebar (database icon)
    • Or press Ctrl+Shift+P and type "DocumentDB"
  2. Add a new connection:

    • Click the DocumentDB icon in the VS Code sidebar
    • Click "Add New Connection"
    • Select "Connection String"
    • Paste the connection string:
      mongodb://admin:password123@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true&authMechanism=SCRAM-SHA-256
      
    • Verify using username and password. The credentials should already be prefilled using the connection string.
  3. Verify the connection - You should see your connection in the DocumentDB explorer


Step 3: Load Sample Data into DocumentDB

Now that DocumentDB is running and connected, let's load sample data to work with throughout the workshop. You'll use the DocumentDB VS Code extension to import JSON files directly into your database.

Understanding the Sample Data

The workshop includes a JSON file with sample data that already contains vector embeddings:

  • data/embedded_data.json - Combined Airbnb listings with pre-generated embeddings

Load Data Using DocumentDB Extension

  1. Open the DocumentDB extension:

    • Click the DocumentDB icon in the left sidebar
    • Expand your connection to see databases
    • Note: Feel free to delete the database "sampledb" from the extension if you see it.
  2. Create the database and collections:

    • Right-click on your connection
    • Select "Create Database"
    • Enter database name: db
    • Press Enter
  3. Create the customers collection:

    • Expand the db database
    • Right-click on the database
    • Select "Create Collection"
    • Enter collection name: listings
    • Press Enter
  4. Import customer data:

    • Right-click on the listings collection
    • Select "Import Documents"
    • Navigate to: data/embedded_data.json
    • Click "Open"
    • Wait for the import confirmation message

🔑 Step 3: Configure OpenAI API Key

You need an OpenAI API key to generate embeddings and use chat completions.

  1. Create a .env file in the project root if it isn't created already:

    cp .env.example .env
  2. Edit .env and add your key:

    OPENAI_API_KEY=sk-your-actual-key-here
  3. Save the file

Verify Your API Key

Run this Python snippet to test:

python -c "import os; from dotenv import load_dotenv; load_dotenv(); print('✅ API key configured' if os.getenv('OPENAI_API_KEY') else '❌ API key missing')"

✅ Verification Checklist

Before moving to Module 1, verify:

  • ✅ Codespace is running without errors
  • ✅ OpenAI API key is configured
  • ✅ DocumentDB connection works
  • ✅ You understand the project structure
  • ✅ You've explored the dataset
  • ✅ You understand the architecture

🎓 Concepts to Remember

Vector Search

  • Converts text to numerical vectors (embeddings)
  • Finds similar items by comparing vector distances
  • Enables semantic search ("find me something cozy" vs exact keyword match)

DocumentDB cosmosSearch

  • Native vector search operator
  • Supports IVF (Inverted File Index) and HNSW algorithms
  • Allows combining vector similarity with other filters

RAG (Retrieval-Augmented Generation)

  • Retrieves relevant documents from a database
  • Augments LLM prompts with retrieved context
  • Generates accurate, grounded responses

Multi-Agent Systems

  • Multiple specialized AI agents working together
  • Each agent has a specific role/expertise
  • Agents coordinate to solve complex tasks

🐛 Troubleshooting

Codespace won't start

  • Wait a few minutes (initial build takes 2-3 min)
  • Check GitHub status page
  • Try rebuilding: Codespaces menu → Rebuild Container

DocumentDB not connecting

# Check if DocumentDB container is running
docker ps | grep documentdb

# Check logs
docker logs documentdb

OpenAI API errors

  • Verify your API key is correct
  • Check you have credits in your OpenAI account
  • Make sure the key has permission to use embeddings and chat APIs

Import errors

# Reinstall dependencies
pip install -r requirements.txt

🚀 Next Steps

You're all set! Time to build your first vector search implementation.

Continue to: Module 1: Vector Search Fundamentals


Questions? Ask your instructor or check the troubleshooting guide.