Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions src/data/nav/platform.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,15 @@ export default {
link: '/docs/platform/pricing/limits',
name: 'Limits',
},
{
name: 'Pricing examples',
pages: [
{
link: '/docs/platform/pricing/examples/ai-chatbot',
name: 'AI support chatbot',
},
],
},
{
link: '/docs/platform/pricing/faqs',
name: 'Pricing FAQs',
Expand Down
13 changes: 13 additions & 0 deletions src/pages/docs/ai-transport/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,16 @@ Take a look at some example code running in-browser of the sorts of features you
},
]}
</Tiles>

## Pricing

AI Transport uses Ably's [usage based billing model](/docs/platform/pricing) at your package rates. Your consumption costs will depend on the number of messages inbound (published to Ably) and outbound (delivered to subscribers), and how long channels or connections are active. [Contact Ably](https://ably.com/contact) to discuss options for Enterprise pricing and volume discounts.

The cost of streaming token responses over Ably depends on:

- the number of tokens in the LLM responses that you are streaming. For example, a simple support chatbot response might be around 300 tokens, a coding session can be 2,000-3,000 tokens and a deep reasoning response could be over 50,000 tokens.
- the rate at which your agent publishes tokens to Ably and the number of messages it uses to do so. Some LLMs output every token as a single event, while others batch multiple tokens together. Similarly, your agent may publish tokens as they are received from the LLM or perform its own processing and batching first.
- the number of subscribers receiving the response.
- the [token streaming pattern](/docs/ai-transport/token-streaming#token-streaming-patterns) you choose.

For example, suppose an AI support chatbot sends a response of 300 tokens, each as a discrete update, using the [message-per-response](/docs/ai-transport/token-streaming/message-per-response) pattern, and with a single client subscribed to the channel. With AI Transport's [append rollup](/docs/ai-transport/messaging/token-rate-limits#per-response),those 300 input tokens will be conflated to 100 discrete inbound messages, resulting in 100 outbound messages and 100 persisted messages. See the [AI support chatbot pricing example](/docs/platform/pricing/examples/ai-chatbot) for a full breakdown of the costs in this scenario.
52 changes: 52 additions & 0 deletions src/pages/docs/platform/pricing/examples/ai-chatbot.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: AI support chatbot pricing example
meta_description: "Calculate AI Transport pricing for conversations with an AI chatbot. Example shows how using the message-per-response pattern and modifying the append rollup window can generate cost savings."
meta_keywords: "chatbot, support chat, token streaming, token cost, AI Transport pricing, Ably AI Transport pricing, stream cost, Pub/Sub pricing, realtime data delivery, Ably Pub/Sub pricing"
intro: "This example uses consumption-based pricing for an AI support chatbot use case, where a single agent is publishing tokens to user over AI Transport."
---

### Assumptions

The scale and features used in this calculation.

| Scale | Features |
|-------|----------|
| 4 user prompts to get to resolution | ✓ Message-per-response |
| 300 token events per LLM response | |
| 75 appends per second from agent | |
| 3 minute average chat duration | |
| 1 million chats | |

### Cost summary

The high level cost breakdown for this scenario is given in the table below. Messages are billed for both inbound (published to Ably) and outbound (delivered to subscribers). Enabling the "Message updates, deletes and appends" [channel rule](/docs/ai-transport/token-streaming/message-per-response#enable) will automatically enable message persistence.

| Item | Calculation | Cost |
|------|-------------|------|
| Messages | 1212M × $2.50/M | $3030 |
| Connection minutes | 6M × $1.00/M | $6 |
| Channel minutes | 3M × $1.00/M | $3 |
| Package fee | | [See plans](/pricing) |
| **Total** | | **~$3039/M chats** |

### Message usage breakdown

Several factors influence the total message usage. The message-per-response pattern includes [automatic rollup of append events](/docs/ai-transport/token-streaming/token-rate-limits#per-response) to reduce consumption costs and avoid rate limits.

- Agent stream time: 300 token events ÷ 75 appends per second = 4 seconds of streaming per response
- Messages published after rollup: 4 seconds x 25 messages/s = **100 messages per response**

| Type | Calculation | Inbound | Outbound | Total messages | Cost |
|------|-------------|---------|----------|----------------|------|
| User prompts | 1M chats × 4 prompts | 4M | 4M | 8M | $20 |
| Agent responses | 1M chats x 4 responses x 100 messages per response | 400M | 400M | 800M | $2000 |
| Persisted messages | Every inbound message is persisted | 404M | 0 | 404M | $1010 |
| **Total** | | **808M** | **404M** | **1212M** | **$3030** |

<Aside data-type='further-reading'>
Useful links for exploring this topic in more detail.
- [Talk with sales](https://ably.com/contact) to get a personalized quote.
- [Learn how HubSpot uses Ably to enable 128,000 businesses with live chat that just works](https://ably.com/case-studies/hubspot)
- [See how doxy.me turned realtime from a liability into a strategic asset](https://ably.com/case-studies/doxyme)
</Aside>