Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9862b44
Add Chatlas integration guide
rich-iannone Jun 14, 2026
91feca0
Lowercase 'chatlas' in guide title
rich-iannone Jun 15, 2026
8779cfe
Clarify closure capture and return format in guide
rich-iannone Jun 15, 2026
83d160b
Document .chat() streaming and tool calls
rich-iannone Jun 15, 2026
dcc0486
Clarify async usage in chatlas guide
rich-iannone Jun 15, 2026
07181d8
Add Cloudflare Browser Rendering Crawler guide
rich-iannone Jun 15, 2026
35377f0
Add CloudflareCrawler prerequisites
rich-iannone Jun 15, 2026
10d7c80
Add basic usage section to Cloudflare crawler docs
rich-iannone Jun 15, 2026
84717a4
Add Cloudflare crawler example to docs
rich-iannone Jun 15, 2026
652836f
Add IngestSummary output example
rich-iannone Jun 15, 2026
f2c1089
Add raghilda imports for Cloudflare crawler
rich-iannone Jun 15, 2026
29d9939
Explain Cloudflare crawler ingestion and indexing
rich-iannone Jun 15, 2026
1b8b419
Add CloudflareCrawler vs WebCrawler section
rich-iannone Jun 15, 2026
ac56bfd
Document CloudflareCrawler render parameter
rich-iannone Jun 15, 2026
465dde3
Add docs for CloudflareCrawler source parameter
rich-iannone Jun 15, 2026
4dad7f7
Document Cloudflare crawler filtering patterns
rich-iannone Jun 15, 2026
a2c6eaa
Add CloudflareCrawler caching documentation
rich-iannone Jun 15, 2026
2524c84
Document modified_since incremental updates
rich-iannone Jun 15, 2026
41c95ca
Document CloudflareCrawler polling and inspection
rich-iannone Jun 15, 2026
8dc3d54
Add full CloudflareCrawler example to guide
rich-iannone Jun 15, 2026
ec63edc
Add raghilda imports for Cloudflare crawler
rich-iannone Jun 15, 2026
58e3048
Add guidance for when to use CloudflareCrawler
rich-iannone Jun 15, 2026
214f35b
Add conclusion to CloudflareCrawler guide
rich-iannone Jun 15, 2026
94634fb
Explain refresh path and caching/deduplication
rich-iannone Jun 15, 2026
1f32f6f
Update user_guide/52-cloudflare-crawler.qmd
rich-iannone Jun 15, 2026
0bc44c8
Omit mention of `account_id` in cache invalidation
rich-iannone Jun 15, 2026
9a4f582
Revise opening of Cloudflare crawler guide page
rich-iannone Jun 15, 2026
3177765
Update user_guide/52-cloudflare-crawler.qmd
rich-iannone Jun 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
313 changes: 313 additions & 0 deletions user_guide/05-chatlas-integration.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,313 @@
---
title: "Using RAG with chatlas"
guide-section: "Getting Started"
---

While raghilda builds the knowledge store, [chatlas](https://posit-dev.github.io/chatlas/) can handle the conversation part. The integration point between the two is a Python function that you register as a tool with chatlas. When the LLM decides it needs information from your store, it calls that function, receives the relevant chunks, and incorporates them into its answer.

This page will walk you through the pattern step by step. It assumes you already have a populated store (look over [Core Concepts](00-getting-started.qmd) or [Crawling and Ingestion](04-crawling-and-ingestion.qmd) if you need to build one first).

## Connecting to a store

Let's start by connecting to an existing store (a `DuckDBStore`). Using `.connect(read_only=True)` is recommended when the store is only used for retrieval:

```{python}
#| eval: false
from raghilda.store import DuckDBStore

store = DuckDBStore.connect("quarto_docs.db", read_only=True)
print(f"Store contains {store.size()} documents")
```

Any raghilda store backend works here: `DuckDBStore`, `ChromaDBStore`, or `OpenAIStore`. The rest of the code is identical regardless of the backend.

## Defining a search tool

chatlas discovers tools through plain Python functions. The function's docstring and type hints tell the model what the tool does and what arguments it accepts. A retrieval tool might look like this:

```{python}
#| eval: false
import json

def search_docs(query: str, num_results: int = 5) -> str:
"""
Search the documentation for relevant information.

Parameters
----------
query
A description of what to look for.
num_results
The number of relevant passages to return (default of `5`).
"""
chunks = store.retrieve(query, top_k=num_results, deoverlap=True)
return json.dumps(
[{"text": chunk.text, "context": chunk.context} for chunk in chunks]
)
```

There are a few things we should take note of:

- The function captures the `store` variable from the surrounding scope. This is a normal Python closure: as long as `store` is defined before the function is called, the reference works.
- The docstring is sent to the model as part of the tool description. Write it for the LLM: be specific about when the tool should be used and what `query=` should contain.
- The return value must be a string because LLM tool-calling APIs transmit results as text. JSON works really well here because it preserves structure without requiring the model to parse anything unusual.
- `deoverlap=True` (the default) merges overlapping chunks from the same document so the model receives coherent passages rather than repetitive fragments.

The goal is a function that returns enough context for the model to answer accurately, but not so much that it drowns the prompt in noise. Start with a simple version like the one above and refine the docstring and return format once you can observe how the model uses the results.

## Registering the tool and chatting

Pass the function to `chat.register_tool()`. After registration, the model can call it whenever it determines that retrieval would help answer a prompt:

```{python}
#| eval: false
from chatlas import ChatOpenAI

chat = ChatOpenAI(
model="gpt-5.5",
system_prompt=(
"You are a helpful assistant that answers questions about Quarto. "
"Use the search_docs tool to find relevant information before answering."
),
)
chat.register_tool(search_docs)

chat.chat("How do I add citations to a Quarto document?")
```

When you call `.chat()`, chatlas sends the prompt to the model, displays any tool calls the model makes (including the query it passes to your function), and then streams the final answer to the terminal. You see the full round trip without needing to wire up any display logic yourself.

The system prompt matters. Instructing the model to use the tool before answering reduces the chance that it falls back on its training data alone.

## Interactive and programmatic use

chatlas provides several ways to consume responses depending on context.

**Console mode** for interactive exploration:

```{python}
#| eval: false
chat.console()
```

This opens a REPL where you can ask questions and see tool calls in real time. Type `exit` or press `Ctrl+C` to quit.

**Streaming** for applications that display output incrementally:

```{python}
#| eval: false
for chunk in chat.stream("What formats does Quarto support?"):
print(chunk, end="", flush=True)
```

**Async** for concurrent workloads (note that `await` requires an `async def` context, so this form is typically used inside an async framework like FastAPI or an `asyncio.run()` entrypoint):

```{python}
#| eval: false
response = await chat.chat_async("How do I create a Quarto presentation?")
print(response)
```

All three modes use the same registered tools and conversation history. The choice depends on where your code runs: `.console()` for quick experimentation in a terminal, `.stream()` for user-facing applications where perceived latency matters, and `.chat_async()` for server-side code that handles multiple requests concurrently.

## Tailoring retrieval to the tool's purpose

The tool function is where you control retrieval quality. Here are adjustments worth considering:

Every `RetrievedChunk` carries an `.origin` attribute that records where the chunk came from (typically a URL or file path). Including it in the JSON response lets the model cite its sources when answering:

```{python}
#| eval: false
def search_docs(query: str, num_results: int = 5) -> str:
"""Search the documentation for relevant information."""
chunks = store.retrieve(query, top_k=num_results, deoverlap=True)
return json.dumps([
{
"text": chunk.text,
"context": chunk.context,
"source": chunk.origin,
}
for chunk in chunks
])
```

Adding `"source": chunk.origin` to the returned dictionary is all it takes. Once the model sees URLs or paths alongside the text, it can reference them in its answer without any additional prompting.

When a store indexes content from multiple sources or sections, you can pass an `attributes_filter=` argument to `retrieve()` to restrict results to a subset. The filter uses a SQL-like expression (`"section = 'guide'"`) that matches against the attributes defined in your store's schema:

```{python}
#| eval: false
def search_guides(query: str) -> str:
"""Search only the user guide section of the documentation."""
chunks = store.retrieve(
query,
top_k=5,
attributes_filter="section = 'guide'",
)
return json.dumps([{"text": chunk.text} for chunk in chunks])
```

Here only chunks whose `section` attribute equals `'guide'` are considered. This keeps retrieval focused and avoids pulling in, for example, API reference text when the user asks a conceptual question. See [Attribute Filters](03-attribute-filters.qmd) for more on defining and using attribute schemas.

You can also register several tool functions on the same chat, each backed by a different filter or even a different store. The model decides which tool to invoke based on the docstrings, so give each function a clear description of what it covers:

```{python}
#| eval: false
def search_api_reference(query: str) -> str:
"""Search the API reference for function signatures and parameters."""
chunks = store.retrieve(
query,
top_k=3,
attributes_filter="section = 'reference'",
)
return json.dumps([{"text": chunk.text} for chunk in chunks])

def search_tutorials(query: str) -> str:
"""Search the tutorials for step-by-step instructions and examples."""
chunks = store.retrieve(
query,
top_k=5,
attributes_filter="section = 'tutorial'",
)
return json.dumps([{"text": chunk.text} for chunk in chunks])

chat.register_tool(search_api_reference)
chat.register_tool(search_tutorials)
```

With two tools registered, a question like `"What arguments does `ChatOpenAI` accept?"` routes to `search_api_reference`, while `"How do I set up streaming in a Shiny app?"` routes to `search_tutorials`. The model makes the choice on each turn, and you can observe which tool it selects by watching the tool-call display in `.chat()` or `.console()`.

None of these adjustments require any changes to chatlas itself. The retrieval logic lives entirely in your tool functions, which means you can iterate on what gets returned, how many results to include, and how to filter without touching the chat configuration. That separation is deliberate and it keeps the conversational layer stable while you tune retrieval independently.

## Choosing a model provider

Because the retrieval logic lives in a plain Python function, the choice of model provider is independent of raghilda. chatlas supports hosted APIs, cloud platforms, and local inference servers. The tool registration interface is the same in every case.

Anthropic's Claude models tend to follow tool-calling instructions closely and produce well-structured answers:

```{python}
#| eval: false
from chatlas import ChatAnthropic

chat = ChatAnthropic(model="claude-opus-4-8")
chat.register_tool(search_docs)
```

Google's Gemini models offer a generous free tier, which is useful for prototyping before committing to a paid API:

```{python}
#| eval: false
from chatlas import ChatGoogle

chat = ChatGoogle(model="gemini-3.5-flash")
chat.register_tool(search_docs)
```

Ollama runs models locally, so nothing leaves your machine. This matters when the store contains proprietary or sensitive material:

```{python}
#| eval: false
from chatlas import ChatOllama

chat = ChatOllama(model="Llama-3.3-8B-Instruct")
chat.register_tool(search_docs)
```

The [chatlas model choice documentation](https://posit-dev.github.io/chatlas/get-started/models.html) lists all available providers. Switching between them requires changing only the constructor call; the registered tools, system prompt, and conversation history carry over if you assign them to a new chat object.

## A full example

The following script builds a store from a documentation site and starts an interactive RAG chat session. It reuses an existing store if one is already present.

```{python}
#| eval: false
from pathlib import Path

from chatlas import ChatOpenAI

from raghilda.chunker import MarkdownChunker
from raghilda.crawl import CrawlScope, WebCrawler
from raghilda.embedding import EmbeddingOpenAI
from raghilda.store import DuckDBStore

DB_PATH = Path("chatlas_docs.db")


def build_store() -> DuckDBStore:
store = DuckDBStore.create(
location=str(DB_PATH),
embed=EmbeddingOpenAI(),
name="chatlas_docs",
title="Chatlas Documentation",
overwrite=True,
)
crawler = WebCrawler(cache_dir=True, max_workers=4)
scope = CrawlScope(
roots=["https://posit-dev.github.io/chatlas/"],
depth=1,
include_types=["html"],
)
chunker = MarkdownChunker()
summary = store.ingest(
crawler.markdown_documents(scope),
prepare=chunker.chunk,
max_workers=4,
)
store.build_index()
print(f"Indexed {summary.inserted} documents")
return store


def get_store() -> DuckDBStore:
if DB_PATH.exists():
return DuckDBStore.connect(str(DB_PATH), read_only=True)
return build_store()


def main():
import json

store = get_store()

def search_chatlas_docs(query: str, num_results: int = 5) -> str:
"""
Search the chatlas documentation.

Use this tool when the user asks about chatlas features,
API usage, model providers, tool calling, or streaming.

Parameters
----------
query
A description of what to look for.
num_results
Number of passages to return (default of 5).
"""
chunks = store.retrieve(query, top_k=num_results, deoverlap=True)
return json.dumps(
[{"text": chunk.text, "context": chunk.context} for chunk in chunks]
)

chat = ChatOpenAI(
model="gpt-5.5",
system_prompt=(
"You answer questions about the chatlas Python library. "
"Always use the search tool before answering."
),
)
chat.register_tool(search_chatlas_docs)
chat.console()


if __name__ == "__main__":
main()
```

This script separates store construction from chat setup so the expensive indexing step only runs once. On subsequent runs it reconnects to the existing database and goes straight to the interactive session. The same structure works for any documentation site or local file collection: swap the `CrawlScope` roots and adjust the system prompt to match your domain.

## Next steps

- The [Core Concepts](00-getting-started.qmd) guide covers building a store from scratch.
- The [Chunking](02-chunking.qmd) guide explains how to tune chunk size and overlap for better retrieval quality.
- The [Attribute Filters](03-attribute-filters.qmd) guide shows how to scope retrieval by metadata.
- The [chatlas documentation](https://posit-dev.github.io/chatlas/get-started/tools.html) has more detail on tool calling, streaming, and structured output.
Loading
Loading