Skip to content

Add reference Telegram bot integration #2

@logosc

Description

@logosc

Summary

Add a reference Telegram bot example (or library package) that demonstrates how to connect Engine[S] to a Telegram chat, similar to how the debug console bridges the agent loop to a web UI.

Motivation

  • Telegram's Bot API maps naturally to ChatInterface — text messages, photo uploads, and reply-based conversations all have direct equivalents.
  • A working Telegram integration would serve as both a reference implementation and a ready-to-use adapter for teams building Telegram-based agents.
  • Currently the SDK has debug/ (web console) and examples/todo-agent (CLI), but no reference for a real messaging platform.
  • Telegram also supports voice messages, which pairs well with the voice input feature proposed in feat: voice message input (audio → tool calls) #1.

Findings (ordered by severity)

  1. Chat/session routing must be explicit. ChannelChat is single-channel; a Telegram bot must multiplex per chat ID into isolated ReplyCh streams. If the reference uses a single ChannelChat, concurrent chats will interleave and corrupt agent context. The design must choose: one bot per chat vs multi-chat manager.

  2. Blocking WaitForReply doesn't map 1:1 to Telegram updates. The SDK expects blocking waits; Telegram delivers updates via webhook or long polling. The integration needs to define how updates are buffered and drained per chat to satisfy WaitForReply semantics, or enforce "one active Run() per chat."

  3. Voice support depends on feat: voice message input (audio → tool calls) #1; boundary must be defined now. Unless the Telegram adapter owns transcription (Path A from feat: voice message input (audio → tool calls) #1) or forwards raw audio into Reply.AudioData (Path B), it will be incomplete. The reference should pick one path or show both with a build flag.

  4. "Minimal dependencies" conflicts with realistic Telegram media handling. Sticking to net/http means implementing download URLs, file size checks, and MIME handling manually. A small, widely-used Telegram library (e.g. telebot) may reduce complexity. The tradeoff should be explicit.

Open questions

  1. Single-chat reference or reusable multi-chat adapter? A single-chat bot is simpler for a reference example, but the issue implies a general adapter. These are different designs.

  2. Engine.Run() per message or long-lived session? Per-message loses conversational memory unless messages are persisted externally. Long-lived needs a per-chat goroutine and cancellation strategy.

Suggested scope

Concurrency model

Start with a per-chat session manager: a map of chatID → (Engine, ChannelChat, state) with one goroutine per active conversation. Each chat gets its own ChannelChat with its own ReplyCh. Telegram updates are routed by chat ID to the correct channel. Idle sessions are reaped after a timeout.

Voice handling

Start with Path A (STT preprocessing) — transcribe via Whisper or Google STT before entering the agent loop. This works with all LLM providers. Path B (native audio to Gemini) can be added later when #1 lands the AudioData fields in the SDK.

Package structure

  • A telegram/ package (or examples/telegram-bot/) implementing ChatInterface per chat.
  • Support for: text send/receive, image attachments (photos → Reply.ImageData), voice messages (transcribed → text).
  • A runnable example showing Engine.Run() connected to a Telegram bot token.

Integration tests

  • Send text → verify Reply.Text
  • Send photo → verify Reply.ImageData populated
  • Send voice → verify transcription arrives as Reply.Text
  • Concurrent chats → verify isolation (no message interleaving)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions