kinobok · mfrszpiotro · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/.github/workflows/daily-scraper-go.yml b/.github/workflows/daily-scraper-go.yml
@@ -0,0 +1,34 @@
+name: Daily Go Scraper
+
+on:
+  schedule:
+    - cron: '30 4 * * *'  # 4:30 AM UTC, runs shortly after the Python version
+  workflow_dispatch:      # Allow manual runs
+
+jobs:
+  scrape:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: 'stable'
+          cache-dependency-path: scraper_go/go.sum
+
+      - name: Run Scraper
+        env:
+          TMDB_API_KEY: ${{ secrets.TMDB_API_KEY }}
+        run: |
+          cd scraper_go
+          go run cmd/scraper/main.go
+
+      - name: Commit and push changes
+        run: |
+          git config --global user.name "github-actions[bot]"
+          git config --global user.email "github-actions[bot]@users.noreply.github.com"
+          git add frontend/public/data_go.json
+          git diff --quiet && git diff --staged --quiet || git commit -m "chore: daily showtime update (Go)"
+          git push
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,4 @@
-letterboxd/
+/letterboxd/
 
 .env
 .worktrees/
diff --git a/conductor/tech-stack.md b/conductor/tech-stack.md
@@ -11,12 +11,13 @@ kinꚘbok uses a decoupled architecture with a statically hosted frontend that c
 - **Testing:** Vitest
 
 # Backend (Scraper)
-- **Language:** Python 3.11+
-- **HTTP Client:** HTTPX
-- **HTML Parsing:** BeautifulSoup4
-- **Data Validation & Modeling:** Pydantic
-- **String Matching:** RapidFuzz
-- **Testing:** Pytest
+- **Language:** Go 1.25+ & Python 3.11+ (Running in parallel during migration)
+- **Framework (Go):** Colly/v2 (for web scraping)
+- **HTTP Client:** HTTPX (Python) / net/http (Go)
+- **HTML Parsing:** BeautifulSoup4 (Python) / Goquery via Colly (Go)
+- **Data Validation & Modeling:** Pydantic (Python) / Custom Go schemas with validation
+- **String Matching & Normalization:** RapidFuzz (Python) / Custom slug-matching & GenerateSlug (Go)
+- **Testing:** Pytest (Python) / Go testing toolchain (Go)
 
 # CI/CD & Deployment
 - **Automation:** GitHub Actions (daily scraper runs, formatting checks, deployment)

diff --git a/conductor/tracks.md b/conductor/tracks.md
@@ -11,3 +11,8 @@ This file tracks all major tracks for the project. Each track has its own detail
 
 - [x] **Track: UX revamp when user clicks cinema map points and typeaheads for search bar**
 *Link: [./tracks/map_search_ux_20260619/](./tracks/map_search_ux_20260619/)*
+
+---
+
+- [x] **Track: Implement the kinobok scraper in Golang using Colly with Goroutines/Channels for concurrency, running in parallel with the Python scraper.**
+*Link: [./tracks/golang_scraper_20260702/](./tracks/golang_scraper_20260702/)*
diff --git a/conductor/tracks/golang_scraper_20260702/index.md b/conductor/tracks/golang_scraper_20260702/index.md
@@ -0,0 +1,5 @@
+# Track golang_scraper_20260702 Context
+
+- [Specification](./spec.md)
+- [Implementation Plan](./plan.md)
+- [Metadata](./metadata.json)
diff --git a/conductor/tracks/golang_scraper_20260702/metadata.json b/conductor/tracks/golang_scraper_20260702/metadata.json
@@ -0,0 +1,8 @@
+{
+  "track_id": "golang_scraper_20260702",
+  "type": "feature",
+  "status": "new",
+  "created_at": "2026-07-02T00:00:00Z",
+  "updated_at": "2026-07-02T00:00:00Z",
+  "description": "Implement the kinobok scraper in Golang using Colly with Goroutines/Channels for concurrency, running in parallel with the Python scraper."
+}
diff --git a/conductor/tracks/golang_scraper_20260702/plan.md b/conductor/tracks/golang_scraper_20260702/plan.md
@@ -0,0 +1,34 @@
+# Implementation Plan: Golang Scraper
+
+## Phase 1: Filmweb Scraper (Concurrent)
+- [x] Task: Filmweb Models and Colly Setup
+    - [x] Write Tests (Red Phase): Define mock server responses and test basic Colly initialization.
+    - [x] Implement (Green Phase): Configure Colly collector and set up Goroutine/Channel architecture.
+- [x] Task: Filmweb Parsing Logic
+    - [x] Write Tests (Red Phase): Test parsing logic for extracting titles, cinemas, and showtimes from mock HTML.
+    - [x] Implement (Green Phase): Implement Colly callbacks, parse HTML, and feed results through channels.
+- [x] Task: Conductor - User Manual Verification 'Phase 1: Filmweb Scraper (Concurrent)' (Protocol in workflow.md)
+
+## Phase 2: Letterboxd and TMDB Integrations
+- [x] Task: TMDB API Integration
+    - [x] Write Tests (Red Phase): Test concurrent fetching of metadata and posters using a mock HTTP client.
+    - [x] Implement (Green Phase): Write concurrent HTTP requests to TMDB and merge with movie data.
+- [x] Task: Letterboxd Integration
+    - [x] Write Tests (Red Phase): Test extraction/parsing of Letterboxd watchlists.
+    - [x] Implement (Green Phase): Build Letterboxd scraping/parsing logic.
+- [x] Task: Conductor - User Manual Verification 'Phase 2: Letterboxd and TMDB Integrations' (Protocol in workflow.md)
+
+## Phase 3: Data Aggregation & Export
+- [x] Task: Orchestration in Main
+    - [x] Write Tests (Red Phase): Test the synchronization and merging of data from Filmweb, TMDB, and Letterboxd.
+    - [x] Implement (Green Phase): Coordinate channels and Goroutines in the main entrypoint (`cmd/scraper/main.go`).
+- [x] Task: Strict Parity JSON Export
+    - [x] Write Tests (Red Phase): Assert that the generated `data_go.json` strictly adheres to the existing Next.js frontend schema.
+    - [x] Implement (Green Phase): Write the final JSON export logic in `internal/export`.
+- [x] Task: Conductor - User Manual Verification 'Phase 3: Data Aggregation & Export' (Protocol in workflow.md)
+
+## Phase 4: CI/CD Integration
+- [x] Task: GitHub Actions Updates
+    - [x] Write Tests (Red Phase): (Skip logic tests, test via dry-run or local action simulator if possible).
+    - [x] Implement (Green Phase): Configure `daily-scraper-go.yml` to execute the Go scraper concurrently with Python and upload `data_go.json` as an artifact or commit it.
+- [x] Task: Conductor - User Manual Verification 'Phase 4: CI/CD Integration' (Protocol in workflow.md)
diff --git a/conductor/tracks/golang_scraper_20260702/spec.md b/conductor/tracks/golang_scraper_20260702/spec.md
@@ -0,0 +1,29 @@
+# Specification: Golang Scraper Implementation
+
+## Overview
+This track focuses on implementing the backend scraper for kinobok in Golang using the Colly framework, intended to eventually replace the existing Python scraper. The initial deployment will run in parallel with the Python scraper to ensure data parity before a full transition.
+
+## Functional Requirements
+1. **Filmweb Scraper:** Implement the logic to scrape movie showtimes and cinema details from Filmweb using the Colly framework.
+2. **Letterboxd Scraper:** Implement the logic to process Letterboxd watchlists/data.
+3. **TMDB Scraper:** Implement integration with the TMDB API/scraper to fetch posters and metadata for movies.
+4. **Data Export:** Generate the final JSON file (`data_go.json`) containing all parsed and matched data.
+
+## Non-Functional Requirements
+1. **Strict Parity:** The exported JSON file (`data_go.json`) MUST perfectly match the schema of the current Python scraper's `data.json` to ensure Next.js frontend compatibility.
+2. **Parallel Execution:** The new Golang scraper must be integrated into the existing CI/CD (GitHub Actions) to run alongside the Python scraper, outputting to a separate file (`data_go.json`) without breaking the current production build.
+3. **Language/Framework:** Use Golang and the Colly web scraping framework as defined in the `scraper_go` directory.
+4. **Concurrency:** Heavily utilize Goroutines and Channels within the scraping process to maximize throughput and enhance the overall execution speed.
+
+## Acceptance Criteria
+- [ ] `FilmwebScraper` correctly parses cinemas, times, and movie titles utilizing concurrent processing.
+- [ ] `Letterboxd` integration correctly extracts watchlist data.
+- [ ] `TMDB` integration accurately fetches required movie metadata concurrently.
+- [ ] `export` package correctly generates `data_go.json` with strict schema parity.
+- [ ] Concurrency patterns (Goroutines/Channels) are demonstrably used for performance.
+- [ ] GitHub Actions workflow is updated to run the Golang scraper and output `data_go.json` alongside `data.json`.
+- [ ] The Next.js frontend can flawlessly consume `data_go.json` if swapped (to be tested locally or manually).
+
+## Out of Scope
+- Modifying the frontend application code to permanently switch to `data_go.json`.
+- Removing or disabling the Python scraper.