From 1b0fa67325b8e4e7fcd0cefd4ee2f03b0fb7d0cd Mon Sep 17 00:00:00 2001 From: Dima Timofeev <1127655+CuriousDima@users.noreply.github.com> Date: Mon, 30 Mar 2026 14:11:56 -0700 Subject: [PATCH] Improve README clarity and links --- README.md | 101 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 80 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 020312a..9b3537a 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,45 @@ # Caltrain Bot -Caltrain Bot is a Python chatbot that answers Caltrain schedule questions from natural language. It currently ships as a Telegram bot, loads Caltrain GTFS data into SQLite, and uses DSPy plus an LLM provider to understand questions such as "When is the next train from Palo Alto to San Francisco around 8am?" +Caltrain Bot is a Telegram bot that answers Caltrain schedule questions in plain English. It loads static GTFS data into SQLite and uses DSPy plus an LLM provider to turn questions like "When is the next train from Palo Alto to San Francisco around 8am?" into schedule lookups. -## Project Layout +Try the live bot on Telegram: [@CalTrain](https://t.me/CalTrain) -- `src/caltrain_bot/telegram_bot.py`: Telegram bot handlers and message formatting. -- `src/caltrain_bot/question_analysis.py`: DSPy-based question classification and entity extraction. -- `src/caltrain_bot/schedule.py`: GTFS loading, preprocessing, and schedule queries. -- `sql/train_stop_timeline.sql`: SQL used to preprocess imported GTFS data into query-friendly tables. -- `src/caltrain_bot/config.py`: Environment-based configuration for Telegram and the LLM provider. -- `data/caltrain-ca-us.zip`: Bundled Caltrain GTFS feed used by the app. -- `tests/unit/`: Unit tests for schedule loading and configuration validation. +Read the build story: +[Part 1](https://dima.us.kg/learning-by-building-part-1-caltrain-bot-and-the-theory-practice-gap/), +[Part 2](https://dima.us.kg/learning-by-building-part-2-caltrain-bot-sql-with-llms-and-dspy/), +[Part 3](https://dima.us.kg/learning-by-building-part-3-caltrain-bot-local-llms-and-reality/) + +## What It Does + +- Answers Caltrain schedule questions from natural language. +- Uses bundled Caltrain GTFS data, so schedule lookups come from a static transit feed rather than live scraping. +- Supports both local LLMs through Ollama and hosted models through OpenRouter. +- Runs as a Telegram bot today, with the schedule and question-analysis logic separated enough to support other chat frontends later. -## Requirements +Example questions: + +- "When is the next train from Mountain View to San Francisco?" +- "Are there any trains from Palo Alto to San Jose after 6pm?" +- "What trains leave Millbrae around 8 in the morning?" + +## Quick Start + +### Requirements - Python 3.11+ - [`uv`](https://docs.astral.sh/uv/) - A Telegram bot token -- A supported LLM provider: Ollama or OpenRouter +- One supported LLM provider: Ollama or OpenRouter + +### Install dependencies + +```shell +uv sync +``` -## Configuration +### Configure environment variables -The bot loads environment variables from your shell and from a `.env` file if present. +The bot reads configuration from your shell and from a `.env` file if present. Required variables: @@ -38,29 +56,59 @@ If `LLM_PROVIDER=openrouter`, also set: - `OPENROUTER_API_KEY` - `OPENROUTER_MODEL` -## Run Locally +Example `.env` for OpenRouter: + +```dotenv +TELEGRAM_BOT_TOKEN=your-telegram-bot-token +LLM_PROVIDER=openrouter +OPENROUTER_API_KEY=your-openrouter-api-key +OPENROUTER_MODEL=openai/gpt-4.1-mini +``` -Install dependencies: +Example `.env` for Ollama: -```shell -uv sync +```dotenv +TELEGRAM_BOT_TOKEN=your-telegram-bot-token +LLM_PROVIDER=ollama +OLLAMA_API_BASE=http://127.0.0.1:11434 +OLLAMA_MODEL=llama3.2 ``` -Start the bot: +### Run the bot ```shell uv run caltrain-bot ``` -## Run Tests +## Development + +Run the test suite: ```shell uv run pytest ``` -## Docker +Run linting: + +```shell +uv run ruff check . +``` + +Format the codebase: + +```shell +uv run ruff format . +``` -Build and publish the Raspberry Pi image locally: +Run type checks: + +```shell +uv run ty check src +``` + +## Docker And Raspberry Pi + +Build and publish the ARM64 image: ```shell docker buildx build --platform linux/arm64 \ @@ -114,6 +162,17 @@ docker run -d \ curiousdima/caltrain-bot:latest ``` +## Project Layout + +- `src/caltrain_bot/__init__.py`: CLI entrypoint that builds and runs the Telegram app. +- `src/caltrain_bot/telegram_bot.py`: Telegram bot handlers and user-facing message formatting. +- `src/caltrain_bot/question_analysis.py`: DSPy-based question classification and entity extraction. +- `src/caltrain_bot/schedule.py`: GTFS loading, SQL preprocessing, station lookup, and train queries. +- `src/caltrain_bot/config.py`: Environment validation and repo-relative asset paths. +- `sql/train_stop_timeline.sql`: SQL used to preprocess imported GTFS data into query-friendly tables. +- `data/caltrain-ca-us.zip`: Bundled Caltrain GTFS feed used by the app. +- `tests/unit/`: Unit tests. + ## Contributing Contributions are welcome. The current project focuses on Telegram, but the schedule and question-analysis logic are already separated enough to support additional chat frontends.