Skip to content

Consider caching/deduplicating repeated Tinybird Stats queries #27905

Description

@fmercurio

Summary

While evaluating the upcoming Tinybird Developer pricing change from QPS/active-minutes to vCPU-second based billing, we noticed that Ghost's built-in web analytics integration can generate repeated Tinybird pipe requests for the same Stats/Admin views.

This does not appear to be causing a significant cost issue for our current workload, but it looks like a good opportunity to make Ghost's Tinybird usage more efficient and more resilient to the new vCPU/burst model.

In particular, short-lived caching and in-flight request deduplication for expensive Stats queries could reduce Tinybird CPU usage and smooth bursty traffic without changing the UI semantics.

Context

Tinybird has announced Developer plan pricing changes:

  • no plan-level QPS overages;
  • usage/overages based on vCPU-seconds above baseline capacity;
  • burst capacity for temporary spikes;
  • overages charged per vCPU-second above baseline.

We reviewed organization.pipe_stats_rt for a Ghost site over a 7-day period to understand the impact.

The direct monthly overage estimate was negligible for us, but the analysis surfaced a clear repeated-query pattern in Ghost Stats/Admin analytics.

Observed data from Tinybird pipe_stats_rt

Across the main web analytics endpoints over 7 days:

  • api_kpis

    • 104 requests
    • ~70.22 vCPU-seconds
    • ~54% of CPU among the inspected endpoints
    • average read volume: ~1.05M rows / ~76.6 MB per request
    • p99 duration close to 9.5s, max ~10.3s
  • api_top_sources

    • 60 requests
    • ~32.53 vCPU-seconds
    • ~25% of CPU among the inspected endpoints
    • average read volume: ~1.02M rows / ~82.5 MB per request
    • max duration ~7.3s
  • api_top_pages

    • 34 requests
    • ~13.59 vCPU-seconds
    • ~10% of CPU among the inspected endpoints
  • api_post_visitor_counts

    • 210 requests
    • ~13.36 vCPU-seconds
    • many requests, but much cheaper per request than the three above

We also found duplicated URLs for the expensive endpoints:

  • 30 distinct repeated URLs
  • 160 total requests to those repeated URLs
  • ~95.03 vCPU-seconds consumed by those repeated URLs
  • estimated avoidable work with a simple "execute once, serve subsequent identical requests from cache" model:
    • ~130 avoidable Tinybird requests
    • ~76.93 avoidable vCPU-seconds
    • roughly ~66% of the CPU used by api_kpis + api_top_sources + api_top_pages in that sample

Examples of repeated patterns included:

  • api_kpis for a 30-day dashboard range requested 15 times with identical parameters;
  • api_kpis for a specific post over 7 days requested 20 times;
  • api_top_sources for the same post/range requested 20 times.

The most common expensive requests were chart/admin-style queries with parameters such as:

  • site_uuid
  • date_from
  • date_to
  • timezone
  • member_status
  • post_uuid
  • from=chart

Some long date ranges were also present, including ranges over 90 days and one edge case with an empty post_uuid= parameter.

Relevant Ghost code paths

From a quick inspection of main:

Browser/Tinybird direct path

api_kpis and api_top_sources are queried from the Admin/Stats frontend via useTinybirdQuery:

  • apps/admin-x-framework/src/hooks/use-tinybird-query.ts
  • apps/admin-x-framework/src/utils/stats-config.ts
  • apps/stats/src/views/Stats/Web/web.tsx
  • apps/posts/src/views/PostAnalytics/Web/web.tsx

This appears to build /v0/pipes/{pipe}.json URLs and query Tinybird directly from the browser using the Tinybird token.

That matches the Tinybird logs we saw: requests for api_kpis/api_top_sources had browser user_agents.

Backend/Ghost API path

api_top_pages seems to go through Ghost's Stats API in at least the main Stats view:

  • apps/stats/src/views/Stats/Web/components/top-content.tsx
  • apps/admin-x-framework/src/api/stats.ts
  • ghost/core/core/server/api/endpoints/stats.js
  • ghost/core/core/server/services/stats/content-stats-service.js
  • ghost/core/core/server/services/stats/utils/tinybird.js

The topContent endpoint already has cache: statsService.cache and a cache key generated from request options, which seems like the right architecture for a shared cache.

Proposed improvement

Consider adding short-lived cache and in-flight request deduplication for the expensive Tinybird Stats endpoints, especially:

  • api_kpis
  • api_top_sources
  • optionally api_top_pages where it is not already covered by the Ghost Stats cache

Option A: small client-side improvement

Add in-memory cache and in-flight dedupe inside useTinybirdQuery for a whitelist of expensive analytics endpoints.

Suggested behavior:

  • cache only successful responses;
  • TTL around 1–5 minutes;
  • dedupe simultaneous identical requests by reusing the same promise;
  • normalize query params before building the cache key;
  • do not cache errors, 408, 429, or 5xx responses;
  • only apply to a whitelist, not all Tinybird queries.

Suggested cache key inputs:

  • final endpoint name/version;
  • site_uuid;
  • normalized query parameters such as date_from, date_to, timezone, member_status, post_uuid, post_type, pathname, source, UTM params, etc.

This would be the lowest-risk change and would reduce repeated requests caused by re-renders or repeated interactions in the same browser session.

Option B: preferred longer-term architecture

Move api_kpis and api_top_sources behind Ghost's Stats API, similar to topContent, so caching is shared server-side rather than per-browser.

Potential shape:

  • add Stats API endpoints for KPIs and top sources;
  • call tinybirdClient.fetch('api_kpis', ...) and tinybirdClient.fetch('api_top_sources', ...) server-side;
  • use statsService.cache with generated cache keys based on normalized options;
  • keep a short TTL, e.g. 1–5 minutes;
  • add in-flight dedupe either at the Stats service layer or Tinybird client layer;
  • update frontend Stats/Post Analytics views to use the Ghost API hooks instead of direct Tinybird queries for these heavy endpoints.

This would share cache hits across admins/browsers and would better smooth Tinybird vCPU burst usage.

Why this matters

With Tinybird's new pricing model, the issue is less about QPS limits and more about short bursts of vCPU usage. Repeated identical analytics requests can create unnecessary spikes even when the monthly overage cost is low.

Caching these endpoints would likely:

  • reduce Tinybird vCPU-seconds;
  • reduce burst pressure under the Developer plan;
  • reduce the probability of 408/429 behavior under load;
  • improve perceived Admin/Stats responsiveness;
  • reduce redundant work for identical chart/dashboard queries.

Suggested tests

For a client-side implementation:

  • identical api_kpis requests within TTL are served from cache;
  • simultaneous identical requests dedupe to one Tinybird request;
  • different params produce different cache keys;
  • endpoint versions are part of the key;
  • non-whitelisted endpoints preserve current behavior;
  • failed responses are not cached.

For a server-side implementation:

  • Stats API cache key includes all relevant options;
  • repeated KPI/top-source requests hit statsService.cache;
  • errors/timeouts are not cached;
  • in-flight duplicate requests result in one upstream Tinybird request;
  • frontend views still render the same data shape.

Notes

This is not a critical bug report from our side; we are not currently seeing meaningful Tinybird overage. It is a performance/cost-resilience improvement suggestion based on real Tinybird usage metrics from a Ghost site and the upcoming Tinybird pricing model change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    community[triage] Community features and bugs

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions