feat: put read replicas to use by Snehil-Shah · Pull Request #1012 · supabase/supavisor

Snehil-Shah · 2026-05-26T05:12:30Z

What kind of change does this PR introduce?

Aims to put read replicas to use.

What is the current behavior?

Cluster routing (the postgres.cluster.alias connection path) is broken, and doesn't work even if the cluster has one replica.

What is the new behavior?

Which queries to route? Only adds support for simple queries in transaction mode for now.
Actual routing logic (can be random, or round robin, not at it yet)

Additional context

We had some precedence in this very repo in #162, which was removed in a future refactor. But it had a naive logic of routing all select statements to any random replica.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-05-26T05:16:09Z

There are some opinionated changes from my end in this file, so feel free to ask for a revert.

We basically had defined wrapper functions for the actual NIF functions, which added basic runtime validation, basically aiming to be the public representation of the underlying NIF (see statements vs statement_types). I have extended that pattern for the new parse & parse_to_json functions, made the NIFs to be private, such that now it exposes the public variants only.

I feel it's a cleaner layout, let me know your thoughts.

Snehil-Shah · 2026-05-26T05:21:51Z

This is my main point of discussion: how conservative should we be?

The current implementation allows for all read-only SELECT statements, and read-only BEGIN statements (transaction start), but stays conservative on function calls. Function calls in general shouldn't be write-safe, but should we have an allowlist for in-built functions like SUM, COUNT, AVG?

Chose to stay conservative for now, as it's safer to be. WDYT?

Snehil-Shah · 2026-05-26T05:35:19Z

I should explain myself here:

I decided to call libpg_query's C function directly for the parse function (pg_query_parse) because the rust wrapper doesn't expose a JSON serializer yet, expecting us to deal with protobuf directly. And to clarify, serde::Serialize derives WAS added in their codebase (pganalyze/pg_query.rs@123d448), which would let us do serde_json::to_string(&result.protobuf) directly, but is yet to be released upstream. Once released, we can refactor to Rust-only.

Or we can pin the crate to the above GitHub commit now as well. I just didn't want to mess with a stable dep. WDYT?

Snehil-Shah · 2026-05-26T05:40:01Z

@v0idpwn @mentels I have implemented a module for classifying read-safe SQL queries as of now. This will be the first step where we decide when exactly to route, before we implement the actual routing itself. Will move to that implementation once I have feedback on this and a direction for the routing.

v0idpwn · 2026-05-26T12:58:25Z

Hi, @Snehil-Shah. It will take us some time to actually review this, since we have other priorities.

Snehil-Shah · 2026-05-27T16:06:21Z

@v0idpwn Totally fine, thanks for the heads up. I'll keep iterating on this in the meantime.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

- earlier logic only checked the first returned tenant for active state, it should simply just return only active tenants - also a drive by clarity improvement for adding a no_users error Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-05-29T21:01:15Z

+  @spec maybe_checkin(:transaction | :session | :proxy, Data.db_connection()) ::
+          Data.db_connection()
+  defp maybe_checkin(:transaction, nil), do: nil

-  defp maybe_checkin(:transaction, pool, {_, db_pid, _}) do
+  defp maybe_checkin(:transaction, {pool, db_pid, _}) do
    Process.unlink(db_pid)
    :poolboy.checkin(pool, db_pid)
    nil
  end

-  defp maybe_checkin(:session, _, db_connection), do: db_connection
-  defp maybe_checkin(:proxy, _, db_connection), do: db_connection
+  defp maybe_checkin(:session, db_connection), do: db_connection
+  defp maybe_checkin(:proxy, db_connection), do: db_connection


db_connection already has the pool it was sourced from, and we were incorrectly passing the entire map of pools in cluster mode before. I just decided to remove the explicit pool argument. The function, imo, reads better now as "checkin the given connection back to its source pool".

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-05-30T16:27:16Z

+  @doc """
+  Returns `true` if the SQL query is read-safe.
+  """
+  @spec read_safe?(String.t()) :: boolean()


One thing to flag here is that this is only a guarantee at the AST level. If their DB has some really weird configuration (TIL: I found out about these rare cases today itself), like views with function calls in their definition, so every read from the view calls a function.. or someone made a custom operator symbol of their own that does a write operation (it would appear as any other operator like + etc).. or someone wrote an RLS policy that checks by writing data somewhere on every read (ik, doesn't make sense). All these cases are very unusual but undetectable at an AST level.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-05-30T19:09:18Z

I went ahead and added the routing logic for clustered tenants. It routes read-safe queries to randomly picked read replicas. I have left various comments highlighting my decisions and questions throughout. Once the implementation is validated, I'll move to telemetry and some integration tests for this.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-06-11T15:18:15Z

+      Telem.pool_checkout_time(
+        System.monotonic_time(:microsecond) - start,
+        data.id,
+        same_box,
+        replica_type
+      )


Should this metric include the time spent in the actual pool selection process? (query parsing, classification). Should that be a separate metric?

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-06-11T15:59:33Z

+  @spec client_query_time(integer(), Supavisor.id(), boolean(), :read | :write) :: :ok | nil
+  def client_query_time(start, Supavisor.id() = id, proxy, query_type) do
    telemetry_execute(
      [:supavisor, :client, :query, :stop],
      %{duration: System.monotonic_time() - start},
-      Map.put(id_to_tags(id), :proxy, proxy)
+      id_to_tags(id)
+      |> Map.put(:proxy, proxy)
+      |> Map.put(:query_type, query_type)
    )


I think this can be a cool metric to include. Can be insightful for read vs write query comparisons.

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah · 2026-06-12T14:05:49Z

@v0idpwn I have added integration tests and I think this PR is now ready for review. This only implements routing for simple queries in transaction mode to begin with. I have started working on extended queries too, but I would limit this PR to simple queries only, so we can work on easy-to-review chunks incrementally. Also, no hurries :)

feat: add parser for read-safe queries

130dcc4

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah commented May 26, 2026

View reviewed changes

Snehil-Shah added 3 commits May 28, 2026 01:34

fix(tenants): incorrect return type in get_cluster_config

281e088

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

feat: add routing for simple queries in transaction mode

cd7b4e1

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah commented May 29, 2026

View reviewed changes

Snehil-Shah added 2 commits May 30, 2026 20:44

fix: read pool cannot be a fallback for write

8b9f68d

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

docs: fix desc accuracy

3340f3c

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah commented May 30, 2026

View reviewed changes

chore: remove unreachable test

f2f552c

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah marked this pull request as ready for review May 30, 2026 19:04

Snehil-Shah requested a review from a team as a code owner May 30, 2026 19:04

Snehil-Shah added 2 commits June 9, 2026 01:31

Merge branch 'main' into read-replicas

696adf2

feat(telem): add replica_type to pool checkout metric

91de74b

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah commented Jun 11, 2026

View reviewed changes

feat(telem): add query_type tag to query metrics

fefda37

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah commented Jun 11, 2026

View reviewed changes

Snehil-Shah marked this pull request as draft June 11, 2026 16:48

test(integration): add cluster_routing_test

d642b2e

Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>

Snehil-Shah marked this pull request as ready for review June 12, 2026 14:01

Merge branch 'main' into read-replicas

ca7d044

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: put read replicas to use#1012

feat: put read replicas to use#1012
Snehil-Shah wants to merge 12 commits into
supabase:mainfrom
Snehil-Shah:read-replicas

Snehil-Shah commented May 26, 2026 •

edited

Loading

Uh oh!

Snehil-Shah May 26, 2026 •

edited

Loading

Uh oh!

Snehil-Shah May 26, 2026 •

edited

Loading

Uh oh!

Snehil-Shah May 26, 2026 •

edited

Loading

Uh oh!

Snehil-Shah commented May 26, 2026

Uh oh!

v0idpwn commented May 26, 2026

Uh oh!

Snehil-Shah commented May 27, 2026

Uh oh!

Snehil-Shah May 29, 2026 •

edited

Loading

Uh oh!

Snehil-Shah May 30, 2026 •

edited

Loading

Uh oh!

Snehil-Shah commented May 30, 2026

Uh oh!

Snehil-Shah Jun 11, 2026

Uh oh!

Snehil-Shah Jun 11, 2026

Uh oh!

Snehil-Shah commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Snehil-Shah commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

Uh oh!

Snehil-Shah May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah commented May 26, 2026

Uh oh!

v0idpwn commented May 26, 2026

Uh oh!

Snehil-Shah commented May 27, 2026

Uh oh!

Snehil-Shah May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah commented May 30, 2026

Uh oh!

Snehil-Shah Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Snehil-Shah commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Snehil-Shah commented May 26, 2026 •

edited

Loading

Snehil-Shah May 26, 2026 •

edited

Loading

Snehil-Shah May 26, 2026 •

edited

Loading

Snehil-Shah May 26, 2026 •

edited

Loading

Snehil-Shah May 29, 2026 •

edited

Loading

Snehil-Shah May 30, 2026 •

edited

Loading