Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 80 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,45 @@
# Caltrain Bot

Caltrain Bot is a Python chatbot that answers Caltrain schedule questions from natural language. It currently ships as a Telegram bot, loads Caltrain GTFS data into SQLite, and uses DSPy plus an LLM provider to understand questions such as "When is the next train from Palo Alto to San Francisco around 8am?"
Caltrain Bot is a Telegram bot that answers Caltrain schedule questions in plain English. It loads static GTFS data into SQLite and uses DSPy plus an LLM provider to turn questions like "When is the next train from Palo Alto to San Francisco around 8am?" into schedule lookups.

## Project Layout
Try the live bot on Telegram: [@CalTrain](https://t.me/CalTrain)

- `src/caltrain_bot/telegram_bot.py`: Telegram bot handlers and message formatting.
- `src/caltrain_bot/question_analysis.py`: DSPy-based question classification and entity extraction.
- `src/caltrain_bot/schedule.py`: GTFS loading, preprocessing, and schedule queries.
- `sql/train_stop_timeline.sql`: SQL used to preprocess imported GTFS data into query-friendly tables.
- `src/caltrain_bot/config.py`: Environment-based configuration for Telegram and the LLM provider.
- `data/caltrain-ca-us.zip`: Bundled Caltrain GTFS feed used by the app.
- `tests/unit/`: Unit tests for schedule loading and configuration validation.
Read the build story:
[Part 1](https://dima.us.kg/learning-by-building-part-1-caltrain-bot-and-the-theory-practice-gap/),
[Part 2](https://dima.us.kg/learning-by-building-part-2-caltrain-bot-sql-with-llms-and-dspy/),
[Part 3](https://dima.us.kg/learning-by-building-part-3-caltrain-bot-local-llms-and-reality/)

## What It Does

- Answers Caltrain schedule questions from natural language.
- Uses bundled Caltrain GTFS data, so schedule lookups come from a static transit feed rather than live scraping.
- Supports both local LLMs through Ollama and hosted models through OpenRouter.
- Runs as a Telegram bot today, with the schedule and question-analysis logic separated enough to support other chat frontends later.

## Requirements
Example questions:

- "When is the next train from Mountain View to San Francisco?"
- "Are there any trains from Palo Alto to San Jose after 6pm?"
- "What trains leave Millbrae around 8 in the morning?"

## Quick Start

### Requirements

- Python 3.11+
- [`uv`](https://docs.astral.sh/uv/)
- A Telegram bot token
- A supported LLM provider: Ollama or OpenRouter
- One supported LLM provider: Ollama or OpenRouter

### Install dependencies

```shell
uv sync
```

## Configuration
### Configure environment variables

The bot loads environment variables from your shell and from a `.env` file if present.
The bot reads configuration from your shell and from a `.env` file if present.

Required variables:

Expand All @@ -38,29 +56,59 @@ If `LLM_PROVIDER=openrouter`, also set:
- `OPENROUTER_API_KEY`
- `OPENROUTER_MODEL`

## Run Locally
Example `.env` for OpenRouter:

```dotenv
TELEGRAM_BOT_TOKEN=your-telegram-bot-token
LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=your-openrouter-api-key
OPENROUTER_MODEL=openai/gpt-4.1-mini
```

Install dependencies:
Example `.env` for Ollama:

```shell
uv sync
```dotenv
TELEGRAM_BOT_TOKEN=your-telegram-bot-token
LLM_PROVIDER=ollama
OLLAMA_API_BASE=http://127.0.0.1:11434
OLLAMA_MODEL=llama3.2
```

Start the bot:
### Run the bot

```shell
uv run caltrain-bot
```

## Run Tests
## Development

Run the test suite:

```shell
uv run pytest
```

## Docker
Run linting:

```shell
uv run ruff check .
```

Format the codebase:

```shell
uv run ruff format .
```

Build and publish the Raspberry Pi image locally:
Run type checks:

```shell
uv run ty check src
```

## Docker And Raspberry Pi

Build and publish the ARM64 image:

```shell
docker buildx build --platform linux/arm64 \
Expand Down Expand Up @@ -114,6 +162,17 @@ docker run -d \
curiousdima/caltrain-bot:latest
```

## Project Layout

- `src/caltrain_bot/__init__.py`: CLI entrypoint that builds and runs the Telegram app.
- `src/caltrain_bot/telegram_bot.py`: Telegram bot handlers and user-facing message formatting.
- `src/caltrain_bot/question_analysis.py`: DSPy-based question classification and entity extraction.
- `src/caltrain_bot/schedule.py`: GTFS loading, SQL preprocessing, station lookup, and train queries.
- `src/caltrain_bot/config.py`: Environment validation and repo-relative asset paths.
- `sql/train_stop_timeline.sql`: SQL used to preprocess imported GTFS data into query-friendly tables.
- `data/caltrain-ca-us.zip`: Bundled Caltrain GTFS feed used by the app.
- `tests/unit/`: Unit tests.

## Contributing

Contributions are welcome. The current project focuses on Telegram, but the schedule and question-analysis logic are already separated enough to support additional chat frontends.
Expand Down
Loading